mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-03 07:05:20 +00:00

Thanks to Pedro at OctoAI we have a new embedding option. The following PR adds support for the use of OctoAI embeddings. Forked from the original OpenAI embeddings class. We removed the use of the LangChain adaptor, and use OpenAI's SDK directly instead. Also updated out-of-date example script. Including new test file for OctoAI. # Testing Get a token from our platform at: https://www.octoai.cloud/ For testing one can do the following: ``` export OCTOAI_TOKEN=<your octo token> python3 examples/embed/example_octoai.py ``` ## Testing done Validated running the above script from within a locally built container via `make docker-start-dev` --------- Co-authored-by: potter-potter <david.potter@gmail.com>
19 lines
626 B
Python
19 lines
626 B
Python
import os
|
|
|
|
from unstructured.documents.elements import Text
|
|
from unstructured.embed.octoai import OctoAiEmbeddingConfig, OctoAIEmbeddingEncoder
|
|
|
|
embedding_encoder = OctoAIEmbeddingEncoder(
|
|
config=OctoAiEmbeddingConfig(api_key=os.environ["OCTOAI_API_KEY"])
|
|
)
|
|
elements = embedding_encoder.embed_documents(
|
|
elements=[Text("This is sentence 1"), Text("This is sentence 2")],
|
|
)
|
|
|
|
query = "This is the query"
|
|
query_embedding = embedding_encoder.embed_query(query=query)
|
|
|
|
[print(e.embeddings, e) for e in elements]
|
|
print(query_embedding, query)
|
|
print(embedding_encoder.is_unit_vector(), embedding_encoder.num_of_dimensions())
|