Fixes https://github.com/Unstructured-IO/unstructured-api/issues/237
The problem:
The `ElementMetadata` class was not able to ignore fields that it didn't
know about. This surfaced in `partition_via_api`, when the hosted api
schema is newer than the local `unstructured` version. In
`ElementMetadata.from_json()` we get errors such as `TypeError:
__init__() got an unexpected keyword argument 'parent_id'`.
The fix:
The `from_json` methods for these dataclasses should drop any unexpected
fields before calling `__init__`.
To verify:
This shouldn't throw an error
```
from unstructured.staging.base import elements_from_json
import json
test_api_result = json.dumps([
{
"type": "Title",
"element_id": "2f7cc75f6467bba468022c4c2875335e",
"metadata": {
"filename": "layout-parser-paper.pdf",
"filetype": "application/pdf",
"page_number": 1,
"new_field": "foo",
},
"text": "LayoutParser: A Unified Toolkit for Deep Learning Based Document Image Analysis"
}
])
elements = elements_from_json(text=test_api_result)
print(elements)
```
* Apply import sorting
ruff . --select I --fix
* Remove unnecessary open mode parameter
ruff . --select UP015 --fix
* Use f-string formatting rather than .format
* Remove extraneous parentheses
Also use "" instead of str()
* Resolve missing trailing commas
ruff . --select COM --fix
* Rewrite list() and dict() calls using literals
ruff . --select C4 --fix
* Add () to pytest.fixture, use tuples for parametrize, etc.
ruff . --select PT --fix
* Simplify code: merge conditionals, context managers
ruff . --select SIM --fix
* Import without unnecessary alias
ruff . --select PLR0402 --fix
* Apply formatting via black
* Rewrite ValueError somewhat
Slightly unrelated to the rest of the PR
* Apply formatting to tests via black
* Update expected exception message to match
0d81564
* Satisfy E501 line too long in test
* Update changelog & version
* Add ruff to make tidy and test deps
* Run 'make tidy'
* Update changelog & version
* Update changelog & version
* Add ruff to 'check' target
Doing so required me to also fix some non-auto-fixable issues. Two of them I fixed with a noqa: SIM115, but especially the one in __init__ may need some attention. That said, that refactor is out of scope of this PR.
* add apply method to apply cleaners to elements
* bump version
* add check for string output
* documentations for the apply method
* change interface to *cleaners