mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-04 15:42:16 +00:00

Thanks to Pedro at OctoAI we have a new embedding option. The following PR adds support for the use of OctoAI embeddings. Forked from the original OpenAI embeddings class. We removed the use of the LangChain adaptor, and use OpenAI's SDK directly instead. Also updated out-of-date example script. Including new test file for OctoAI. # Testing Get a token from our platform at: https://www.octoai.cloud/ For testing one can do the following: ``` export OCTOAI_TOKEN=<your octo token> python3 examples/embed/example_octoai.py ``` ## Testing done Validated running the above script from within a locally built container via `make docker-start-dev` --------- Co-authored-by: potter-potter <david.potter@gmail.com>
20 lines
868 B
Python
20 lines
868 B
Python
from unstructured.documents.elements import Text
|
|
from unstructured.embed.octoai import OctoAiEmbeddingConfig, OctoAIEmbeddingEncoder
|
|
|
|
|
|
def test_embed_documents_does_not_break_element_to_dict(mocker):
|
|
# Mocked client with the desired behavior for embed_documents
|
|
mock_client = mocker.MagicMock()
|
|
mock_client.embed_documents.return_value = [1, 2]
|
|
|
|
# Mock create_client to return our mock_client
|
|
mocker.patch.object(OctoAIEmbeddingEncoder, "create_client", return_value=mock_client)
|
|
|
|
encoder = OctoAIEmbeddingEncoder(config=OctoAiEmbeddingConfig(api_key="api_key"))
|
|
elements = encoder.embed_documents(
|
|
elements=[Text("This is sentence 1"), Text("This is sentence 2")],
|
|
)
|
|
assert len(elements) == 2
|
|
assert elements[0].to_dict()["text"] == "This is sentence 1"
|
|
assert elements[1].to_dict()["text"] == "This is sentence 2"
|