David Potter a18b21c06e
rfctr [P6M-397]: opensearch source connector v2 (#3302)
Updates opensearch source connector to v2. Leverages elasticsearch v2
heavily.

Expected tests renamed because thats how Elasticsearch names them.
2024-07-01 20:35:26 +00:00
..
2024-06-13 18:41:54 +00:00
2024-05-21 17:01:49 +00:00
2024-05-21 17:01:49 +00:00
2024-05-21 17:01:49 +00:00

Ingest CLI

This package helps map user input via a cli to the underlying ingest code to run a small ETL pipeline.

Design Reference

cli.py is the main entrypoint to run the cli itself. The key points for this is the interaction between all source and destination connectors.

To manually run the cli:

PYTHONPATH=. python unstructured/ingest/v2/main.py --help

The main.py file simply wraps the generated Click command created in cli.py.

Source Commands

All source commands are added as sub commands to the parent ingest Click group. This allows each command to map to different connectors with shared and unique parameters.

Destination Commands

All destination commands are added as sub commands to each parent source command. This allows each invocation of the source sub command to display all possible destination subcommands. The code un utils.py helps structure the generated text from the Click library to be more intuitive on this approach (i.e. list sub commands as Destinations).

Configs

The configs in configs/ and connector specific ones in cmds/ help surface all user parameters that are needed to marshall the input dictionary from Click into all the respective configs needed to create a full pipeline run. Because click returns a flat dictionary of user inputs, the extract_config method in utils.py helps deserialize this dictionary into dataclasses that have nexted fields (such as access configs).