mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2026-01-05 11:50:51 +00:00
### Description All other connectors that were not included in https://github.com/Unstructured-IO/unstructured/pull/2194 are now updated to follow the new pattern and mark any variables as sensitive where it makes sense. Core changes: * All connectors now support an `AccessConfig` to mark data that's needed for auth (i.e. username, password) and those that are sensitive are designated appropriately using the new enhanced field. * All cli configs on the cli definition now inherit from the base config in the connector file to reuse the variables set on that dataclass * The base writer class was updated to better generalize the new approach given better use of dataclasses * The base cli classes were refactored to also take into account the need for a connector and write config when creating the respective runner/writer classes. * Any mismatch between the cli field name and the dataclass field name were updated on the dataclass side to not impact the user but maintain consistency * Add custom redaction logic for mongodb URIs since the password is expected to be a part of it. Now this: `"mongodb+srv://ingest-test-user:r4hK3BD07b@ingest-test.hgaig.mongodb.net/"` -> `"mongodb+srv://ingest-test-user:***REDACTED***@ingest-test.hgaig.mongodb.net/"` in the logs * Bundle all fsspec based files into their own packages. * Refactor custom `_decode_dataclass` used for enhanced json mixin by using a monkey-patch approach. The original approach was breaking on optional nested dataclasses when serializing since the other methods in `dataclasses_json_core` weren't using the new method. By monkey-patching the original method with a new one, all other methods in that library would use the new one. --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: rbiseck3 <rbiseck3@users.noreply.github.com>
18 lines
368 B
JSON
18 lines
368 B
JSON
[
|
|
{
|
|
"element_id": "f931bdb912a40a788890924578a0cff7",
|
|
"metadata": {
|
|
"data_source": {
|
|
"date_created": "2023-08-02T20:36:00.000Z",
|
|
"date_modified": "2023-08-17T18:49:00.000Z"
|
|
},
|
|
"filetype": "text/html",
|
|
"languages": [
|
|
"eng"
|
|
],
|
|
"page_number": 1
|
|
},
|
|
"text": "Sprint 1",
|
|
"type": "Title"
|
|
}
|
|
] |