David Potter 1a706771fa
feature: add octoai for embeddings (#2538)
Thanks to Pedro at OctoAI we have a new embedding option.

The following PR adds support for the use of OctoAI embeddings.

Forked from the original OpenAI embeddings class. We removed the use of
the LangChain adaptor, and use OpenAI's SDK directly instead.

Also updated out-of-date example script.

Including new test file for OctoAI.

# Testing
Get a token from our platform at: https://www.octoai.cloud/
For testing one can do the following:
```
export OCTOAI_TOKEN=<your octo token>
python3 examples/embed/example_octoai.py
```

## Testing done
Validated running the above script from within a locally built container
via `make docker-start-dev`

---------

Co-authored-by: potter-potter <david.potter@gmail.com>
2024-02-10 15:27:06 +00:00

20 lines
868 B
Python

from unstructured.documents.elements import Text
from unstructured.embed.octoai import OctoAiEmbeddingConfig, OctoAIEmbeddingEncoder
def test_embed_documents_does_not_break_element_to_dict(mocker):
# Mocked client with the desired behavior for embed_documents
mock_client = mocker.MagicMock()
mock_client.embed_documents.return_value = [1, 2]
# Mock create_client to return our mock_client
mocker.patch.object(OctoAIEmbeddingEncoder, "create_client", return_value=mock_client)
encoder = OctoAIEmbeddingEncoder(config=OctoAiEmbeddingConfig(api_key="api_key"))
elements = encoder.embed_documents(
elements=[Text("This is sentence 1"), Text("This is sentence 2")],
)
assert len(elements) == 2
assert elements[0].to_dict()["text"] == "This is sentence 1"
assert elements[1].to_dict()["text"] == "This is sentence 2"