2022-06-29 14:35:19 -04:00
|
|
|
[metadata]
|
|
|
|
license_files = LICENSE.md
|
|
|
|
|
|
|
|
[flake8]
|
2024-06-13 11:19:42 -07:00
|
|
|
ignore = E203,E704,W503
|
2022-06-29 14:35:19 -04:00
|
|
|
max-line-length = 100
|
2023-10-17 08:45:12 -04:00
|
|
|
exclude =
|
|
|
|
.venv
|
2023-12-01 12:56:31 -08:00
|
|
|
unstructured-inference
|
2023-12-05 11:42:23 -05:00
|
|
|
per-file-ignores =
|
|
|
|
*: T20
|
2022-06-29 14:35:19 -04:00
|
|
|
|
|
|
|
[tool:pytest]
|
|
|
|
filterwarnings =
|
|
|
|
ignore::DeprecationWarning
|
2023-09-19 15:32:46 -07:00
|
|
|
python_classes = Test Describe
|
|
|
|
python_functions = test_ it_ they_ but_ and_
|
2023-10-31 16:02:00 -05:00
|
|
|
markers =
|
|
|
|
chipper: mark a test as running chipper, which tends to be slow and compute-heavy.
|
Dynamic ElementMetadata implementation (#2043)
### Executive Summary
The structure of element metadata is currently static, meaning only
predefined fields can appear in the metadata. We would like the
flexibility for end-users, at their own discretion, to define and use
additional metadata fields that make sense for their particular
use-case.
### Concepts
A key concept for dynamic metadata is _known field_. A known-field is
one of those explicitly defined on `ElementMetadata`. Each of these has
a type and can be specified when _constructing_ a new `ElementMetadata`
instance. This is in contrast to an _end-user defined_ (or _ad-hoc_)
metadata field, one not known at "compile" time and added at the
discretion of an end-user to suit the purposes of their application.
An ad-hoc field can only be added by _assignment_ on an already
constructed instance.
### End-user ad-hoc metadata field behaviors
An ad-hoc field can be added to an `ElementMetadata` instance by
assignment:
```python
>>> metadata = ElementMetadata()
>>> metadata.coefficient = 0.536
```
A field added in this way can be accessed by name:
```python
>>> metadata.coefficient
0.536
```
and that field will appear in the JSON/dict for that instance:
```python
>>> metadata = ElementMetadata()
>>> metadata.coefficient = 0.536
>>> metadata.to_dict()
{"coefficient": 0.536}
```
However, accessing a "user-defined" value that has _not_ been assigned
on that instance raises `AttributeError`:
```python
>>> metadata.coeffcient # -- misspelled "coefficient" --
AttributeError: 'ElementMetadata' object has no attribute 'coeffcient'
```
This makes "tagging" a metadata item with a value very convenient, but
entails the proviso that if an end-user wants to add a metadata field to
_some_ elements and not others (sparse population), AND they want to
access that field by name on ANY element and receive `None` where it has
not been assigned, they will need to use an expression like this:
```python
coefficient = metadata.coefficient if hasattr(metadata, "coefficient") else None
```
### Implementation Notes
- **ad-hoc metadata fields** are discarded during consolidation (for
chunking) because we don't have a consolidation strategy defined for
those. We could consider using a default consolidation strategy like
`FIRST` or possibly allow a user to register a strategy (although that
gets hairy in non-private and multiple-memory-space situations.)
- ad-hoc metadata fields **cannot start with an underscore**.
- We have no way to distinguish an ad-hoc field from any "noise" fields
that might appear in a JSON/dict loaded using `.from_dict()`, so unlike
the original (which only loaded known-fields), we'll rehydrate anything
that we find there.
- No real type-safety is possible on ad-hoc fields but the type-checker
does not complain because the type of all ad-hoc fields is `Any` (which
is the best available behavior in my view).
- We may want to consider whether end-users should be able to add ad-hoc
fields to "sub" metadata objects too, like `DataSourceMetadata` and
conceivably `CoordinatesMetadata` (although I'm not immediately seeing a
use-case for the second one).
2023-11-15 13:22:15 -08:00
|
|
|
testpaths =
|
|
|
|
test_unstructured
|
|
|
|
test_unstructured_ingest
|
2023-10-17 08:45:12 -04:00
|
|
|
|
|
|
|
[autoflake]
|
|
|
|
expand_star_imports=true
|
2023-10-20 10:00:19 -04:00
|
|
|
ignore_pass_statements=false
|
2023-10-17 08:45:12 -04:00
|
|
|
recursive=true
|
|
|
|
quiet=true
|
|
|
|
remove_all_unused_imports=true
|
|
|
|
remove_unused_variables=true
|