3 Commits

Author SHA1 Message Date
Ronny H
51427b3103
Renamed OpenAiEmbeddingConfig dataclass (#2546) 2024-02-14 17:24:52 +00:00
Roman Isecke
8ba9fadf8a
feat: improve dataclass use for encoders (#2318)
### Description
Leverage a similar pattern to what is used for connectors, where there
is a nested config dataclass as a field, along with cached content for
things like the client and sample embedding for each. This required an
update on the embeddings config in ingest and I left a TODO in there
because the current approach breaks on other encoders such as bedrock
because the parameters in that config don't map to all encoders. But
this keeps the existing functionality working.

This update makes sure all variables associated with the dataclass exist
when it's instantiated rather than being added in the `__post_init__()`
method or the `initialize()`, allowing other libraries like pydantic to
appropriately generate schemas from it. It also now follows the pattern
of the connectors in that each class has a nested config class used to
instantiate the client itself as well as a field/property approach used
to cache the client.
2023-12-26 22:33:19 +00:00
ryannikolaidis
40523061ca
fix: _add_embeddings_to_elements bug resulting in duplicated elements (#1719)
Currently when the OpenAIEmbeddingEncoder adds embeddings to Elements in
`_add_embeddings_to_elements` it overwrites each Element's `to_dict`
method, mistakenly resulting in each Element having identical values
with the exception of the actual embedding value. This was due to the
way it leverages a nested `new_to_dict` method to overwrite. Instead,
this updates the original definition of Element itself to accommodate
the `embeddings` field when available. This also adds a test to validate
that values are not duplicated.
2023-10-12 21:47:32 +00:00