labcore.data.datadict#
datadict.py :
Data classes we use throughout the plottr package, and tools to work on them.
Module Attributes
|
shortcut to |
Functions
|
Try to make one datadict out of multiple. |
|
Try to make a meshgrid from a dataset. |
|
Check whether two datasets are equal. |
|
Construct a DataDict from a string description. |
|
make a pandas Dataframe from a datadict. |
|
makes an xarray Dataset from a MeshgridDataDict. |
Try to guess the shape of the datadict dependents from the axes values. |
|
|
Checks if |
|
Make a DataDict from a MeshgridDataDict by reshaping the data. |
|
Converts a meta data key to just the name. |
|
Converts |
|
shortcut to |
Classes
|
The most basic implementation of the DataDict class. |
|
Simple data storage class that is based on a regular dictionary. |
|
Implementation of DataDictBase meant to be used for when the axes form a grid on which the dependent values reside. |
Exceptions
- class labcore.data.datadict.DataDict(**kw: Any)[source]#
Bases:
DataDictBaseThe most basic implementation of the DataDict class.
It only enforces that the number of records per data field must be equal for all fields. This refers to the most outer dimension in case of nested arrays.
The class further implements simple appending of datadicts through the
DataDict.appendmethod, as well as allowing addition of DataDict instances.- add_data(**kw: Any) None[source]#
Add data to all values. new data must be valid in itself.
This method is useful to easily add data without needing to specify meta data or dependencies, etc.
- Parameters:
kw – one array per data field (none can be omitted).
- append(newdata: DataDict) None[source]#
Append a datadict to this one by appending data values.
- Parameters:
newdata – DataDict to append.
- Raises:
ValueError, if the structures are incompatible.
- expand() DataDict[source]#
Expand nested values in the data fields.
Flattens all value arrays. If nested dimensions are present, all data with non-nested dims will be repeated accordingly – each record is repeated to match the size of the nested dims.
- Returns:
The flattened dataset.
- Raises:
ValueErrorif data is not expandable.
- is_expandable() bool[source]#
Determine if the DataDict can be expanded.
Expansion flattens all nested data values to a 1D array. For doing so, we require that all data fields that have nested/inner dimensions (i.e, inside the records level) shape the inner shape. In other words, all data fields must be of shape (N,) or (N, (shape)), where shape is common to all that have a shape not equal to (N,).
- Returns:
Trueif expandable.Falseotherwise.
- is_expanded() bool[source]#
Determine if the DataDict is expanded.
- Returns:
Trueif expanded.Falseif not.
- nrecords() int | None[source]#
Gets the number of records in the dataset.
- Returns:
The number of records in the dataset.
- remove_invalid_entries() DataDict[source]#
Remove all rows that are
Noneornp.nanin all dependents.- Returns:
The cleaned DataDict.
- class labcore.data.datadict.DataDictBase(**kw: Any)[source]#
Bases:
dictSimple data storage class that is based on a regular dictionary.
This base class does not make assumptions about the structure of the values. This is implemented in inheriting classes.
- add_meta(key: str, value: Any, data: str | None = None) None[source]#
Add meta info to the dataset.
If the key already exists, meta info will be overwritten.
- Parameters:
key – Name of the meta field (without underscores).
value – Value of the meta information.
data – If
None, meta will be global; otherwise assigned to data fielddata.
- astype(dtype: dtype) T[source]#
Convert all data values to given dtype.
- Parameters:
dtype – np dtype.
- Returns:
Dataset, with values as given type (not a copy)
- axes(data: Sequence[str] | str | None = None) List[str][source]#
Return a list of axes.
- Parameters:
data – if
None, return all axes present in the dataset, otherwise only the axes of the dependentdata.- Returns:
The list of axes.
- axes_are_compatible() bool[source]#
Check if all dependent data fields have the same axes.
This includes axes order.
- Returns:
TrueorFalse.
- clear_meta(data: str | None = None) None[source]#
Deletes all meta data.
- Parameters:
data – If not
None, delete all meta only from specified data fielddata. Else, deletes all top-level meta, as well as meta for all data fields.
- data_items() Iterator[Tuple[str, Dict[str, Any]]][source]#
Generator for data field items.
Like dict.items(), but ignores meta data.
- Returns:
Generator yielding first the key of the data field and second its value.
- data_vals(key: str) ndarray[source]#
Return the data values of field
key.Equivalent to
DataDict['key'].values.- Parameters:
key – Name of the data field.
- Returns:
Values of the data field.
- delete_meta(key: str, data: str | None = None) None[source]#
Deletes specific meta data.
- Parameters:
key – Name of the meta field to remove.
data – If
None, this affects global meta; otherwise remove from data fielddata.
- dependents() List[str][source]#
Get all dependents in the dataset.
- Returns:
A list of the names of dependents.
- extract(data: List[str], include_meta: bool = True, copy: bool = True, sanitize: bool = True) T[source]#
Extract data from a dataset.
Return a new datadict with all fields specified in
dataincluded. Will also take any axes fields along that have not been explicitly specified. Will return empty ifdataconsists of only axes fields.- Parameters:
data – Data field or list of data fields to be extracted.
include_meta – If
True, include the global meta data. data meta will always be included.copy – If
True, data fields will be deep copies of the original.sanitize – If
True, will run DataDictBase.sanitize before returning.
- Returns:
New DataDictBase containing only requested fields.
- has_meta(key: str) bool[source]#
Check whether meta field exists in the dataset.
- Returns:
Trueif it exists,Falseif it doesn’t.
- label(name: str) str | None[source]#
Get the label for a data field. If no label is present returns the name of the data field as the label. If a unit is present, it will be appended at the end in brackets: “label (unit)”.
- Parameters:
name – Name of the data field.
- Returns:
Labelled name.
- mask_invalid() T[source]#
Mask all invalid data in all values. :return: Copy of the dataset with invalid entries (nan/None) masked.
- meta_items(data: str | None = None, clean_keys: bool = True) Iterator[Tuple[str, Dict[str, Any]]][source]#
Generator for meta items.
Like dict.items(), but yields only meta entries. The keys returned do not contain the underscores used internally.
- Parameters:
data – If
Noneiterate over global meta data. If it’s the name of a data field, iterate over the meta information of that field.clean_keys – If True, remove the underscore pre/suffix.
- Returns:
Generator yielding first the key of the data field and second its value.
- meta_val(key: str, data: str | None = None) Any[source]#
Return the value of meta field
key(given without underscore).- Parameters:
key – Name of the meta field.
data –
Nonefor global meta; name of data field for data meta.
- Returns:
The value of the meta information.
- nbytes(name: str | None = None) int | None[source]#
Get the size of data.
- Parameters:
name – Name of the data field. if none, return size of entire datadict.
- Returns:
size in bytes.
- remove_unused_axes() T[source]#
Removes axes not associated with dependents.
- Returns:
Cleaned dataset.
- reorder_axes(data_names: Sequence[str] | str | None = None, **pos: int) T[source]#
Reorder data axes.
- Parameters:
data_names – Data name(s) for which to reorder the axes. If None, apply to all dependents.
pos – New axes position in the form
axis_name = new_position. Non-specified axes positions are adjusted automatically.
- Returns:
Dataset with re-ordered axes (not a copy)
- reorder_axes_indices(name: str, **pos: int) Tuple[Tuple[int, ...], List[str]][source]#
Get the indices that can reorder axes in a given way.
- Parameters:
name – Name of the data field of which we want to reorder axes.
pos – New axes position in the form
axis_name = new_position. Non-specified axes positions are adjusted automatically.
- Returns:
The tuple of new indices, and the list of axes names in the new order.
- static same_structure(*data: T, check_shape: bool = False) bool[source]#
Check if all supplied DataDicts share the same data structure (i.e., dependents and axes).
Ignores meta data and values. Checks also for matching shapes if check_shape is True.
- Parameters:
data – The data sets to compare.
check_shape – Whether to include shape check in the comparison.
- Returns:
Trueif the structure matches for all, elseFalse.
- set_meta(key: str, value: Any, data: str | None = None) None#
Add meta info to the dataset.
If the key already exists, meta info will be overwritten.
- Parameters:
key – Name of the meta field (without underscores).
value – Value of the meta information.
data – If
None, meta will be global; otherwise assigned to data fielddata.
- shapes() Dict[str, Tuple[int, ...]][source]#
Get the shapes of all data fields.
- Returns:
A dictionary of the form
{key : shape}, where shape is the np.shape-tuple of the data with namekey.
- structure(add_shape: bool = False, include_meta: bool = True, same_type: bool = False, remove_data: List[str] | None = None) T | None[source]#
Get the structure of the DataDict.
Return the datadict without values (value omitted in the dict).
- Parameters:
add_shape – Deprecated – ignored.
include_meta – If True, include the meta information in the returned dict.
same_type – If True, return type will be the one of the object this is called on. Else, DataDictBase.
remove_data – any data fields listed will be removed from the result, also when listed in any axes.
- Returns:
The DataDict containing the structure only. The exact type is the same as the type of
self.
- static to_records(**data: Any) Dict[str, ndarray][source]#
Convert data to records that can be added to the
DataDict. All data is converted to np.array, and reshaped such that the first dimension of all resulting arrays have the same length (chosen to be the smallest possible number that does not alter any shapes beyond adding a length-1 dimension as first dimension, if necessary).If a data field is given as
None, it will be converted tonumpy.array([numpy.nan]).- Parameters:
data – keyword arguments for each data field followed by data.
- Returns:
Dictionary with properly shaped data.
- validate() bool[source]#
Check the validity of the dataset.
- Checks performed:
All axes specified with dependents must exist as data fields.
- Other tasks performed:
unitkeys are created if omitted.labelkeys are created if omitted.shapemeta information is updated with the correct values (only if present already).
- Returns:
Trueif valid,Falseif invalid.- Raises:
ValueErrorif invalid.
- exception labcore.data.datadict.GriddingError[source]#
Bases:
ValueError
- class labcore.data.datadict.MeshgridDataDict(**kw: Any)[source]#
Bases:
DataDictBaseImplementation of DataDictBase meant to be used for when the axes form a grid on which the dependent values reside.
It enforces that all dependents have the same axes and all shapes need to be identical.
- mean(axis: str) MeshgridDataDict[source]#
Take the mean over the given axis.
- Parameters:
axis – which axis to take the average over.
- Returns:
data, averaged over
axis.
- reorder_axes(data_names: Sequence[str] | str | None = None, **pos: int) MeshgridDataDict[source]#
Reorder the axes for all data.
This includes transposing the data, since we’re on a grid.
- Parameters:
data_names – Which dependents to include. if None are given, all dependents are included.
pos – New axes position in the form
axis_name = new_position. non-specified axes positions are adjusted automatically.
- Returns:
Dataset with re-ordered axes.
- shape() None | Tuple[int, ...][source]#
Return the shape of the meshgrid.
- Returns:
The shape as tuple.
Noneif no data in the set.
- labcore.data.datadict.combine_datadicts(*dicts: DataDict) DataDictBase | DataDict[source]#
Try to make one datadict out of multiple.
Basic rules:
We try to maintain the input type.
Return type is ‘downgraded’ to DataDictBase if the contents are not compatible (i.e., different numbers of records in the inputs).
- Returns:
Combined data.
- labcore.data.datadict.datadict_to_meshgrid(data: DataDict, target_shape: Tuple[int, ...] | None = None, inner_axis_order: None | Sequence[str] = None, use_existing_shape: bool = False, copy: bool = True) MeshgridDataDict[source]#
Try to make a meshgrid from a dataset.
- Parameters:
data – Input DataDict.
target_shape – Target shape. If
Nonewe useguess_shape_from_datadictto infer.inner_axis_order –
If axes of the datadict are not specified in the ‘C’ order (1st the slowest, last the fastest axis) then the ‘true’ inner order can be specified as a list of axes names, which has to match the specified axes in all but order. The data is then transposed to conform to the specified order.
Note
If this is given, then
target_shapeneeds to be given in in the order of this inner_axis_order. The output data will keep the axis ordering specified in the axes property.use_existing_shape – if
True, simply use the shape that the data already has. For numpy-array data, this might already be present. IfFalse, flatten and reshape.copy – if
True, then we make a copy of the data arrays. ifFalse, data array is modified in-place.
- Raises:
GriddingError (subclass of ValueError) if the data cannot be gridded.
- Returns:
The generated
MeshgridDataDict.
- labcore.data.datadict.datasets_are_equal(a: DataDictBase, b: DataDictBase, ignore_meta: bool = False) bool[source]#
Check whether two datasets are equal.
Compares type, structure, and content of all fields.
- Parameters:
a – First dataset.
b – Second dataset.
ignore_meta – If
True, do not verify if metadata matches.
- Returns:
TrueorFalse.
- labcore.data.datadict.datastructure_from_string(description: str) DataDict[source]#
Construct a DataDict from a string description.
Examples
"data[mV](x, y)"results in a datadict with one dependentdatawith unitmVand two independents,xandy, that do not have units."data_1[mV](x, y); data_2[mA](x); x[mV]; y[nT]"results in two dependents, one of them depening onxandy, the other only onx. Note thatxandyhave units. We can (but do not have to) omit them when specifying the dependencies."data_1[mV](x[mV], y[nT]); data_2[mA](x[mV])". Same result as the previous example.
- Rules:
We recognize descriptions of the form
field1[unit1](ax1, ax2, ...); field1[unit2](...); ....Field names (like
field1andfield2above) have to start with a letter, and may contain word characters.Field descriptors consist of the name, optional unit (presence signified by square brackets), and optional dependencies (presence signified by round brackets).
Dependencies (axes) are implicitly recognized as fields (and thus have the same naming restrictions as field names).
Axes are separated by commas.
Axes may have a unit when specified as dependency, but besides the name, square brackets, and commas no other characters are recognized within the round brackets that specify the dependency.
In addition to being specified as dependency for a field, axes may be specified also as additional field without dependency, for instance to specify the unit (may simplify the string). For example,
z1[x, y]; z2[x, y]; x[V]; y[V].Units may only consist of word characters.
Use of unexpected characters will result in the ignoring the part that contains the symbol.
The regular expression used to find field descriptors is:
((?<=\A)|(?<=\;))[a-zA-Z]+\w*(\[\w*\])?(\(([a-zA-Z]+\w*(\[\w*\])?\,?)*\))?
- labcore.data.datadict.dd2df(dd: DataDict)[source]#
make a pandas Dataframe from a datadict. Uses MultiIndex, and assumes that all data fields are compatible.
- Parameters:
dd (DataDict) – source data
- Returns:
pandas DataFrame
- Return type:
DataFrame
- labcore.data.datadict.dd2xr(dd: MeshgridDataDict) Dataset[source]#
makes an xarray Dataset from a MeshgridDataDict.
- TODO: currently only supports ‘regular’ grides, i.e., all axes
are independet of each other, and can be represented by 1d arrays. For each axis, the first slice is used as coordinate values.
- Parameters:
dd (MeshgridDataDict) – input data
- Returns:
xarray Dataset
- Return type:
xr.Dataset
- labcore.data.datadict.guess_shape_from_datadict(data: DataDict) Dict[str, None | Tuple[List[str], Tuple[int, ...]]][source]#
Try to guess the shape of the datadict dependents from the axes values.
- Parameters:
data – Dataset to examine.
- Returns:
A dictionary with the dependents as keys, and inferred shapes as values. Value is
None, if the shape could not be inferred.
- labcore.data.datadict.is_meta_key(key: str) bool[source]#
Checks if
keyis meta information.- Parameters:
key – The
keywe are checking.- Returns:
Trueif it is,Falseif it isn’t.
- labcore.data.datadict.meshgrid_to_datadict(data: MeshgridDataDict) DataDict[source]#
Make a DataDict from a MeshgridDataDict by reshaping the data.
- Parameters:
data – Input
MeshgridDataDict.- Returns:
Flattened
DataDict.
- labcore.data.datadict.meta_key_to_name(key: str) str[source]#
Converts a meta data key to just the name. E.g: for
key: “__meta__” returns “meta”- Parameters:
key – The key that is being converted
- Returns:
The name of the key.
- Raises:
ValueErrorif thekeyis not a meta key.
- labcore.data.datadict.meta_name_to_key(name: str) str[source]#
Converts
nameinto a meta data key. E.g: “meta” gets converted to “__meta__”- Parameters:
name – The name that is being converted.
- Returns:
The meta data key based on
name.
- labcore.data.datadict.str2dd(description: str) DataDict#
shortcut to
datastructure_from_string().