42 Commits

Author SHA1 Message Date
Sebastian Husch Lee
14895f6573
chore: Use token instead of use_auth_token because of deprecation warning (#8552)
* Use token instead of use_auth_token because of deprecation warning

* Fix test

* pylint

* fix linting

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-11-18 11:58:22 +00:00
Ivo Bellin Salarin
c78545dfc0
feat(openai): be tolerant to exceptions (#8526)
* feat: be tolerant to exceptions

if ever an error is raised by the OpenAI API, don't fail the entire processing

* fix: missing import, string separator

* Enhance error handling

* Use batched from more_itertools for compatibility with older Python versions

* Fix batching and add test

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-15 10:52:44 +01:00
Ajit Singh
6cf13e8b98
enhancement: reduced usage of numpy and substituted built-in libraries (#8418)
* reduced usage of numpy and substituted built-in libraries

* added release note

* edited expit function to support both float as well as list (this case was giving error CI)

* revert code , numpy can't be removed here

* more cleaning

* fix relnote

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-10-18 15:42:19 +02:00
Alper
b40f0c8b5d
feat: SentenceTransformersTextEmbedder supports config_kwargs (#8432)
* add config_kwargs

* disable PLR0913 for a specific function

* add a release note

* refer to AutoConfig in config_kwargs docstring

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: Julian Risch <julianrisch@gmx.de>
2024-10-14 16:08:53 +00:00
David S. Batista
b81abc0c85
feat: SentenceTransformersDocumentEmbedder supports config_kwargs (#8433)
* initial import

* adding release notes
2024-10-14 17:43:04 +02:00
David S. Batista
97126eb544
fix: changing default model to gpt-4o-mini on OpenAI API calls (#8360)
* chaning default model to gpt-4o-mini

* adding release notes

* fixing some missed tests

* fixing some more missed tests

* fixing one last missed test

* fixing linting issues

* making pylint happy about an end2end test

* chaning if test to walruss operator

* fixing azure embedder from ada to text-embedding-ada-002
2024-09-17 10:36:42 +02:00
Sebastian Husch Lee
06dd5c2f37
feat (v2): Update so model_max_length updates max_seq_length for Sentence Transformers (#8334)
* Update so model_max_length does what is expected

* Add release notes

* Some fixes

* Another test
2024-09-06 11:37:56 +02:00
Nicola Procopio
4c798470b2
added precision parameter to sentence transformers embeddings (#8179)
* added `precision` parameter to sentence transformers embeddings

* fixed test

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/embedders/test_sentence_transformers_text_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/embedders/test_sentence_transformers_text_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* fix format

* Update sentence_transformers_text_embedder.py

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-08-09 11:38:47 +02:00
Sebastian Husch Lee
c90495c2e8
feat: Add model and tokenizer kwargs to TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder (#8145)
* Start adding model and tokenizer kwargs support

* Add model and tokenizer kwargs to doc embedder

* Some updates and fixes in tests

* Fix more tests

* Fix tests

* Add release note

* Fix test

* Add from_dict tests
2024-08-02 10:37:10 +02:00
Nicola Procopio
47f4db8698
added truncate_dim to sentence transformers embedder (#8077)
* added truncate_dim to sentence transformers embedder

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update releasenotes/notes/release-note-2b603a123cd36214.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* fixed parameter description

* added test for truncation to text embedder

* fix format

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-26 10:39:48 +02:00
Sebastian Husch Lee
c121c86c4c
fix: Fix from_dict methods of components using HF models to work with default values (#8003)
* Fix from_dict to work if device isn't provided in init params

* Minor refactoring of from_dict for components that load HF models

* Add tests

* Update tests to test loading with all default parameters

* Add more tests

* Add release notes

* Add unit test for whisper local

* Update reno

* Add fix for ExtractiveReader

* Fix NamedEntityExtractor
2024-07-10 12:18:05 +02:00
Nitanshu Vashistha
cd8a5b98fe
feat: Configure max_retries & timeout for AzureOpenAITextEmbedder (#7993)
max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
2024-07-09 09:56:46 +02:00
Nitanshu Vashistha
f9d53c5ca8
feat: Configure max_retries and timeout for AzureOpenAIDocumentEmbedder (#7994)
* feat: Configure max_retries & timeout for AzureOpenAIDocumentEmbedder

max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>

* Update retries-and-timeout-for-AzureOpenAIDocumentEmbedder-006fd84204942e43.yaml

* Update haystack/components/embedders/azure_document_embedder.py

* Update haystack/components/embedders/azure_document_embedder.py

---------

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-07-08 22:35:25 +02:00
Vladimir Blagojevic
535a281eec
feat: Add option to use HF_TOKEN as env var for authentication across all HF components (#7942)
* Read both HF_API_TOKEN and HF_TOKEN env vars in all HF related components

* Add reno note

* Test fixes

* More test updates

* More test updates
2024-06-27 10:31:58 +02:00
Stefano Fiorucci
75ad76a7ce
chore: remove deprecated TEI embedders (#7907)
* remove deprecated TEI embedders

* rm from the embedders init

* rm related tests
2024-06-21 10:36:12 +02:00
Carlos Fernández
686a4999cf
feat: widen support of env vars in OpenAI components (#7653)
* add enviroment variables to the _enviroment.py file

* add support for two of the three variables

* Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder.

* Replicate support for env vars in OpenAITextEmbedder.

* Add support for env vars in OpenAIGenerator..

* Add support for env vars in OpenAIChatGenerator.

* add docstrings and reno

* add params to __init__ in OpenAIDocumentEmbedder

* add params to __init__ in OpenAITextEmbedder

* make fully functional implementation of env vars and unit tests

* update reno

* Update haystack/components/embedders/openai_text_embedder.py

* reverse changes to telemetry/_enviroment.py

* Update haystack/components/embedders/openai_text_embedder.py

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-15 21:58:41 +00:00
Sebastian Husch Lee
a2be90b95a
fix: Update device deserialization for components that use local models (#7686)
* fix: Update device deserializtion for SentenceTransformersTextEmbedder

* Add unit test

* Fix unit test

* Make same change to doc embedder

* Add release notes

* Add same change to Diversity Ranker and Named Entity Extractor

* Add unit test

* Add the same for whisper local

* Update release notes
2024-05-14 08:36:14 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Stefano Fiorucci
7c9532b200
fix broken serialization of HFAPI components (#7661) 2024-05-08 17:14:37 +02:00
Stefano Fiorucci
39be515ba6
skip HF integrations tests if running from fork (#7517) 2024-04-09 17:47:13 +02:00
Stefano Fiorucci
eff53a9131
feat: HuggingFaceAPIDocumentEmbedder (#7485)
* add HuggingFaceAPITextEmbedder

* add HuggingFaceAPITextEmbedder

* rm unneeded else

* wip

* small fixes

* deprecation; reno

* Apply suggestions from code review

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* make params mandatory

* changes requested

* fix test

* fix test

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-08 15:06:26 +02:00
Stefano Fiorucci
c91bd49cae
feat: HuggingFaceAPITextEmbedder (#7484)
* add HuggingFaceAPITextEmbedder

* add HuggingFaceAPITextEmbedder

* rm unneeded else

* small fixes

* changes requested

* fix test
2024-04-08 14:22:54 +02:00
Ashwin Mathur
1c7d1618d8
Add truncate and normalize parameters to TEI Embedders (#7460) 2024-04-03 16:41:30 +02:00
Nicola Procopio
42c5b7af32
feat: added dimensions parameters to Azure OpenAI Embedders (#7449)
* added dimensions parameter to AzureOpenAIEmbedders

* created releasenote

* update release note

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-04-02 14:04:16 +02:00
Vladimir Blagojevic
2aae8472e7
feat: Add trust_remote_code init param to SentenceTransformer embedders (#7356)
* Add trust_remote_code init param to SentenceTransformer embedders

* Add release note

* Go with no kwargs solution

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Pydoc fix

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-03-14 11:14:04 +01:00
Ashwin Mathur
8d7a58347d
fix: HuggingFaceTEITextEmbedder returning embedding of incorrect shape when used with Docker endpoint (#7319)
* Fix HuggingFaceTEITextEmbedder

* Update haystack/components/embedders/hugging_face_tei_text_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Improve imports; Add additional tests

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-03-07 16:23:57 +01:00
Stefano Fiorucci
d00f171f8b
refactor!: Sentence Transformers Embedders - new devices mgmt (#7033)
* new device mgmt for Sentence Transformers embedders

* reno
2024-02-19 14:52:44 +01:00
Tuana Çelik
e2cee468fc
fix: Adding api_base_url to OpenAITextEmbeder self assignments (#7004)
* assigning api_base_url

This fix resolves issues with the MistralTextEmbedder integration

* adding base url to `to_dict` and the tests

* adding release note

* Update fix-openai-base-url-assignment-0570a494d88fe365.yaml

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-02-15 17:35:28 +01:00
sahusiddharth
3bd6ba93ca
feat:Add dimensions parameter to OpenAI Embedders to fully support th… (#6841)
* feat:Add dimensions parameter to OpenAI Embedders to fully support the new models

* fixed linting

* changed != None to is not None
2024-02-05 16:20:46 +01:00
Madeesh Kannan
27d1af3068
feat!: Use Secret for passing authentication secrets to components (#6887)
* feat!: Use `Secret` for passing authentication secrets to components

* Add comment to clarify type ignore
2024-02-05 13:17:01 +01:00
Vladimir Blagojevic
6e86f4e26a
Update embedding integration tests (#6823) 2024-01-24 15:22:47 +01:00
ZanSara
288ed150c9
feat!: Rename model_name or model_name_or_path to model in all Embedder classes (#6733)
* rename model parameter in the openai doc embedder

* fix tests for openai doc embedder

* rename model parameter in the openai text embedder

* fix tests for openai text embedder

* rename model parameter in the st doc embedder

* fix tests for st doc embedder

* rename model parameter in the st backend

* fix tests for st backend

* rename model parameter in the st text embedder

* fix tests for st text embedder

* fix docstring

* fix pipeline utils

* fix e2e

* reno

* fix the indexing pipeline _create_embedder function

* fix e2e eval rag pipeline

* pytest
2024-01-12 15:30:17 +01:00
Massimiliano Pippi
93b2aaee09
chore: move DocumentJoiner to new joiners package (#6692)
* move DocumentJoiner to new joiners package

* relnote

* leftovers

* fix docstrings generation

* fix unrelated pydoc misconfiguration

* more unrelated work, yay!

* fix assertions
2024-01-08 22:06:27 +01:00
Silvano Cerza
9445b2d466
Fix skipif with empty env var (#6704) 2024-01-08 19:19:14 +01:00
Silvano Cerza
607e7d1488
Skip integration tests if env var is missing (#6703) 2024-01-08 17:15:10 +01:00
Vladimir Blagojevic
552f0e394b
feat: Add Azure embedders support (#6676)
* Add Azure embedders
---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-01-05 15:49:25 +01:00
Stefano Fiorucci
c773c30c66
refactor!: rename all remaining metadata to meta (#6650)
* change metadata to meta

* release note
2023-12-28 12:18:15 +01:00
Vladimir Blagojevic
4d08be0c2a
feat: Update OpenAI Python Client in Haystack 2.x (#6584)
* Update openai python client

* Add release note

* Consolidate multiple mock_chat_completion into one

* Ensure all components have api_base_url, organization params

* Update tests

* Enable function calling

* Oversight

* Minor fixes, add streaming test mocks

* Apply suggestions from code review

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* metadata -> meta

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2023-12-21 16:21:24 +01:00
Ashwin Mathur
fc88ef7076
feat: Add HuggingFace TEI Embedders - HuggingFaceTEITextEmbedder and HuggingFaceTEIDocumentEmbedder (#6602)
* Add TEI Embedders

* Add release notes

* Update release notes with usage examples
2023-12-21 12:16:36 +01:00
Massimiliano Pippi
7c05f37a53
remove unit marker (#6450) 2023-11-29 19:24:25 +01:00
Silvano Cerza
e6637f5ec2 Fix all tests 2023-11-24 14:48:43 +01:00
Massimiliano Pippi
8adb8bbab8
Remove preview folder in test/
---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2023-11-24 11:52:55 +01:00