mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-03 23:20:35 +00:00

Closes #1782 This PR: - Extends ingest pipeline so that it is possible to select an embedding provider from a range of providers - Modifies the ingest embedding test to be a diff test, since the embedding vectors are reproducible after supporting multiple providers Additional info on the chosen provider for the test: - Found `langchain.embeddings.HuggingFaceEmbeddings` to be deterministic even when there's no seed set - Took 6.84s to pass a unit test with the provider (without cache, including model download) - `langchain.embeddings.HuggingFaceEmbeddings` runs in local, making it zero cost For all these reasons, testing embedding modules with the Huggingface model seems to be making sense --------- Co-authored-by: cragwolfe <crag@unstructured.io> Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: ahmetmeleq <ahmetmeleq@users.noreply.github.com>