73 Commits

Author SHA1 Message Date
Sebastian Husch Lee
ce0917e586
feat: Add raise_on_failure boolean parameter to OpenAIDocumentEmbedder and AzureOpenAIDocumentEmbedder (#9474)
* Add raise_on_failure to OpenAIDocumentEmbedder

* Add reno

* Add parameter to Azure Doc embedder as well

* Fix bug

* Update reno

* PR comments

* update reno
2025-06-03 10:22:34 +00:00
Stefano Fiorucci
2616d4d55b
test: speed up some tests + minor refactorings (#9451)
* this is an integration test

* more improvements

* rm redundant comments
2025-05-29 09:49:11 +02:00
David S. Batista
da60156174
chore: removing unused imports from tests (#9446) 2025-05-26 16:22:51 +00:00
Sebastian Husch Lee
e6a53b9dca
fix: Add missing timeout and max_retries to OpenAITextEmbedder and OpenAIDocumentEmbedder (#9421)
* Add missing params to to_dict for OpenAI embedders

* add reno

* Track variable internally instead of using client
2025-05-22 09:19:14 +00:00
Jan Trienes
83b087caf4
feat: add local_files_only to sentence-transformers embedders (#9400)
* feat: add  to sentence-transformers embedders

* add release note

* Fix wording

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-05-19 16:11:49 +00:00
David S. Batista
f233e06f0a
feat : adding a new Protocol for TextEmbedder (#9353)
* initial import

* removing unused imports

* adding an Embbeder Protocol

* adding tests

* adding tests

* adding release notes

* renaming dir

* removing dir

* cleaning

* adding clean tests

* dealing eith elipsis and pylint

* wip: extending tests

* cleaning extended tests

* adding an invalid TextEmbedder
2025-05-12 12:35:09 +02:00
Stefano Fiorucci
38c39a49de
test: review integration tests (#9306)
* AzureOCR: convert integration test to unit test and simplify

* clean up HuggingFaceAPITextEmbedder

* clean up LinkContentFetcher

* simplify HuggingFaceLocalGenerator

* clean up OpenAIGenerator

* OpenAIChatGenerator

* SentenceTransformersDiversityRanker

* TransformersSimilarityRanker

* ChatMessage: rm outdated tests

* fail fast false

* typo
2025-04-25 09:07:57 +02:00
Stefano Fiorucci
e3d4e21237
test: mark more tests as slow (#9296)
* test: mark tests as slow

* alphabetical order; install xet

* revert pyproject

* Trigger Build

* simplify tests as suggested

* add comment to workflow
2025-04-24 10:25:13 +02:00
Grig Alex
14669419f2
feat: Allow OpenAI client config in other components (#9270)
* Add http config to generators

* Add http config to RemoteWhisperTranscriber

* Add http config to embedders

* Add notes of http config

* disable linter too-many-positional-arguments

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
2025-04-22 09:44:55 +00:00
Amna Mubashar
498637788a
feat: Allow OpenAI client config in OpenAIChatGenerator and AzureOpenAIChatGenerator (#9215)
* Allow OpenAI client config in chat generator

* Add init_http_client as a util method

* Update azure chat gen

* Fix linting
2025-04-16 18:32:13 +02:00
MetroCat69
f7ac4b35cb
feat: add run_async for HuggingFaceAPIDocumentEmbedder (#9226)
* added async support for HuggingFaceAPIDocumentEmbedder

* added type anotations, removed unused import

* Trigger mark test complited

* Apply suggestions from code review

* utility function

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-04-16 09:54:36 +02:00
David S. Batista
45aa9608b5
removing async test for non-existant model (#9208) 2025-04-10 12:38:35 +02:00
Arseniy Shkunkov
bac29d9337
feat: add run_async for HuggingFaceAPITextEmbedder (#9204)
* Initial commit

* adding release notes

* adding async integrations tests

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-04-10 11:44:40 +02:00
Stefano Fiorucci
45cd6f43d6
feat: make AzureOpenAIDocumentEmbedder inherit from OpenAIDocumentEmbedder - async support (#9189)
* feat: make AzureOpenAIDocumentEmbedder inherit from OpenAIDocumentEmbedder - async support

* fix type

* rm unused import

* do not replace newlines

* fix test
2025-04-08 12:51:45 +02:00
Stefano Fiorucci
6f4e70050f
feat: make AzureOpenAITexttEmbedder inherit from OpenAITextEmbedder - async support (#9188)
* draft

* updates

* relnote
2025-04-08 12:51:34 +02:00
Francesco Nuzzo
c539ffa4c3
feat: add run_async for OpenAITextEmbedder (#9084)
* feat: add run_async for OpenAITextEmbedder

* fix: typing

* fix: avoid replacing newlines with spaces.

Also fix kwargs "input" field to include prefix and suffix

* ci: add release notes

* expand release notes; unit tests

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
Co-authored-by: Michele Pangrazzi <xmikex83@gmail.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-04-08 07:20:10 +00:00
Mohammed Abdul Razak Wahab
a2f73d134d
feat(embedders): Add async support for OpenAI document embedder (#9140)
* feat(embedders): Add async support for OpenAI document embedder

* add release notes

* resolve review comments

* Update releasenotes/notes/openai-document-embedder-async-support-b46f1e84043da366.yaml

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update openai-document-embedder-async-support-b46f1e84043da366.yaml

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-04-04 11:55:59 +00:00
Stefano Fiorucci
d1db061058
test: make Azure embedders correctly run on PRs from forks (#9168) 2025-04-04 11:16:58 +02:00
Amna Mubashar
dd6ff10d3b
feat: allow OpenAI client config in AzureOpenAI embedders (#9136)
* Allow OpenAI client config
2025-04-02 16:50:48 +02:00
scara
18367203a8
fix: manage max_retries=0 in AzureOpenAIGenerator, AzureOpenAIChatGenerator, AzureOpenAITextEmbedder, AzureOpenAIDocumentEmbedder (#9128)
* fix: manage max_retries=0 in AzureOpenAIGenerator and AzureOpenAIChatGenerator

* fix: manage max_retries=0 in AzureOpenAITextEmbedder and AzureOpenAIDocumentEmbedder
2025-03-28 13:11:09 +01:00
Stefano Fiorucci
1c1030efc6
chore: make Haystack warnings consistent (#9083)
* chore: make Haystack warnings consistent

* more structured logging

* small fixes
2025-03-21 18:18:55 +01:00
Sebastian Husch Lee
4edefe3e56
Feat: Support Azure Workload Identity Credential (#9012)
* Start adding support for passing callable to Azure components

* Add to chat version

* Fix test

* Add reno

* Add support to azure doc and text embedder

* Rename

* update llm metadata extractor

* Add tests for text embedder

* Update tests

* Remove unused fixture and import

* Update reno
2025-03-12 13:45:40 +01:00
Mohammed Abdul Razak Wahab
0d65b4caa7
feat: Enhance error handling in Azure document embedder (#8941)
* feat: Enhance error handling in Azure document embedder

* add release notes

* address review comments

* Update releasenotes/notes/add-azure-embedder-exception-handler-c10ea46fb536de3b.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* more alignment with OpenAI impl

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-03-04 11:16:08 +01:00
Julian Risch
6652dd7550
Revert "test: skip HF API live integration tests (#8889)" (#8914)
* Revert "test: skip HF API live integration tests (#8889)"

This reverts commit 56a3a9bd61b7391ae91e3d8179b3b33918ef4932.

* Replace zephyr-7b-beta model with SmolLM2-1.7B-Instruct

* Use zephyr-7b-beta model but extend instructions

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-25 09:03:20 +01:00
Stefano Fiorucci
56a3a9bd61
test: skip HF API live integration tests (#8889)
* skip HF API integration tests

* better wording
2025-02-20 16:38:57 +00:00
Ulises M
bfdad40a80
feat: Add ONNX & OpenVINO backend support, and torch dtype kwargs in Sentence Transformers Components (#8813)
* initial rough draft

* expose backend instead of extracting from model_kwargs

* explictly set backend model path

* add reno

* expose backend for ST diversity backend

* add dtype tests and expose kwargs to ST ranker for backend parameters

* skip dtype tests as torch isnt compiled with cuda

* add new openvino dependency release, unskip tests

* resolve suggestion

* mock calls, turn integrations into unit tests

* remove unnecessary test dependencies
2025-02-13 12:04:14 +01:00
György Orosz
d2348ad462
feat: SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder can accept and pass any arguments to SentenceTransformer.encode (#8806)
* feat: SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder can accept and pass any arguments to SentenceTransformer.encode

* refactor: encode_kwargs parameter of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder mae to be the last positional parameter for backward compatibility reasons

* docs: added explanation for encode_kwargs in SentenceTransformersTextEmbedder and SentenceTransformersDocumentEmbedder

* test: added tests for encode_kwargs in SentenceTransformersTextEmbedder and SentenceTransformersDocumentEmbedder

* doc: removed empty lines from docstrings of SentenceTransformersTextEmbedder and SentenceTransformersDocumentEmbedder

* refactor: encode_kwargs parameter of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder mae to be the last positional parameter for backward compatibility (part II.)
2025-02-05 16:09:35 +00:00
Stefano Fiorucci
5ae94886b2
fix: fix test failures with Transformers models in PRs from forks (#8809)
* trigger

* try pinning sentence transformers

* make integr tests run right away

* pin transformers instead

* older transformers version

* rm transformers pin

* try ignoring cache

* change ubuntu version

* try removing token

* try again

* more HF_API_TOKEN local deletions

* restore test priority

* rm leftover

* more deletions

* moreee

* more

* deletions

* restore jobs order
2025-02-04 19:08:37 +01:00
Stefano Fiorucci
877f826da0
refactor: HF API Embedders - use InferenceClient.feature_extraction instead of InferenceClient.post (#8794)
* HF API Embedders: refactoring

* rename variables

* rm leftovers

* rm pin

* rm unused import

* relnote

* warning with truncate/normalize and serverless inference API

* test that warnings are raised
2025-02-03 15:11:16 +00:00
Amna Mubashar
db76ae2847
feat: add default_headers for Azure embedders (#8699)
* Add default_headers param to azure embedders
2025-01-12 17:41:38 +01:00
mathislucka
fe9b1e29d4
CI: fix format after newly introduced formatting rules from ruff release (#8696) 2025-01-09 16:25:55 +00:00
Sebastian Husch Lee
14895f6573
chore: Use token instead of use_auth_token because of deprecation warning (#8552)
* Use token instead of use_auth_token because of deprecation warning

* Fix test

* pylint

* fix linting

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-11-18 11:58:22 +00:00
Ivo Bellin Salarin
c78545dfc0
feat(openai): be tolerant to exceptions (#8526)
* feat: be tolerant to exceptions

if ever an error is raised by the OpenAI API, don't fail the entire processing

* fix: missing import, string separator

* Enhance error handling

* Use batched from more_itertools for compatibility with older Python versions

* Fix batching and add test

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-15 10:52:44 +01:00
Ajit Singh
6cf13e8b98
enhancement: reduced usage of numpy and substituted built-in libraries (#8418)
* reduced usage of numpy and substituted built-in libraries

* added release note

* edited expit function to support both float as well as list (this case was giving error CI)

* revert code , numpy can't be removed here

* more cleaning

* fix relnote

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-10-18 15:42:19 +02:00
Alper
b40f0c8b5d
feat: SentenceTransformersTextEmbedder supports config_kwargs (#8432)
* add config_kwargs

* disable PLR0913 for a specific function

* add a release note

* refer to AutoConfig in config_kwargs docstring

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: Julian Risch <julianrisch@gmx.de>
2024-10-14 16:08:53 +00:00
David S. Batista
b81abc0c85
feat: SentenceTransformersDocumentEmbedder supports config_kwargs (#8433)
* initial import

* adding release notes
2024-10-14 17:43:04 +02:00
David S. Batista
97126eb544
fix: changing default model to gpt-4o-mini on OpenAI API calls (#8360)
* chaning default model to gpt-4o-mini

* adding release notes

* fixing some missed tests

* fixing some more missed tests

* fixing one last missed test

* fixing linting issues

* making pylint happy about an end2end test

* chaning if test to walruss operator

* fixing azure embedder from ada to text-embedding-ada-002
2024-09-17 10:36:42 +02:00
Sebastian Husch Lee
06dd5c2f37
feat (v2): Update so model_max_length updates max_seq_length for Sentence Transformers (#8334)
* Update so model_max_length does what is expected

* Add release notes

* Some fixes

* Another test
2024-09-06 11:37:56 +02:00
Nicola Procopio
4c798470b2
added precision parameter to sentence transformers embeddings (#8179)
* added `precision` parameter to sentence transformers embeddings

* fixed test

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/embedders/test_sentence_transformers_text_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/embedders/test_sentence_transformers_text_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* fix format

* Update sentence_transformers_text_embedder.py

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-08-09 11:38:47 +02:00
Sebastian Husch Lee
c90495c2e8
feat: Add model and tokenizer kwargs to TransformersSimilarityRanker, SentenceTransformersDocumentEmbedder, SentenceTransformersTextEmbedder (#8145)
* Start adding model and tokenizer kwargs support

* Add model and tokenizer kwargs to doc embedder

* Some updates and fixes in tests

* Fix more tests

* Fix tests

* Add release note

* Fix test

* Add from_dict tests
2024-08-02 10:37:10 +02:00
Nicola Procopio
47f4db8698
added truncate_dim to sentence transformers embedder (#8077)
* added truncate_dim to sentence transformers embedder

* Update haystack/components/embedders/sentence_transformers_document_embedder.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update releasenotes/notes/release-note-2b603a123cd36214.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* fixed parameter description

* added test for truncation to text embedder

* fix format

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-07-26 10:39:48 +02:00
Sebastian Husch Lee
c121c86c4c
fix: Fix from_dict methods of components using HF models to work with default values (#8003)
* Fix from_dict to work if device isn't provided in init params

* Minor refactoring of from_dict for components that load HF models

* Add tests

* Update tests to test loading with all default parameters

* Add more tests

* Add release notes

* Add unit test for whisper local

* Update reno

* Add fix for ExtractiveReader

* Fix NamedEntityExtractor
2024-07-10 12:18:05 +02:00
Nitanshu Vashistha
cd8a5b98fe
feat: Configure max_retries & timeout for AzureOpenAITextEmbedder (#7993)
max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
2024-07-09 09:56:46 +02:00
Nitanshu Vashistha
f9d53c5ca8
feat: Configure max_retries and timeout for AzureOpenAIDocumentEmbedder (#7994)
* feat: Configure max_retries & timeout for AzureOpenAIDocumentEmbedder

max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.

timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>

* Update retries-and-timeout-for-AzureOpenAIDocumentEmbedder-006fd84204942e43.yaml

* Update haystack/components/embedders/azure_document_embedder.py

* Update haystack/components/embedders/azure_document_embedder.py

---------

Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-07-08 22:35:25 +02:00
Vladimir Blagojevic
535a281eec
feat: Add option to use HF_TOKEN as env var for authentication across all HF components (#7942)
* Read both HF_API_TOKEN and HF_TOKEN env vars in all HF related components

* Add reno note

* Test fixes

* More test updates

* More test updates
2024-06-27 10:31:58 +02:00
Stefano Fiorucci
75ad76a7ce
chore: remove deprecated TEI embedders (#7907)
* remove deprecated TEI embedders

* rm from the embedders init

* rm related tests
2024-06-21 10:36:12 +02:00
Carlos Fernández
686a4999cf
feat: widen support of env vars in OpenAI components (#7653)
* add enviroment variables to the _enviroment.py file

* add support for two of the three variables

* Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder.

* Replicate support for env vars in OpenAITextEmbedder.

* Add support for env vars in OpenAIGenerator..

* Add support for env vars in OpenAIChatGenerator.

* add docstrings and reno

* add params to __init__ in OpenAIDocumentEmbedder

* add params to __init__ in OpenAITextEmbedder

* make fully functional implementation of env vars and unit tests

* update reno

* Update haystack/components/embedders/openai_text_embedder.py

* reverse changes to telemetry/_enviroment.py

* Update haystack/components/embedders/openai_text_embedder.py

---------

Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-15 21:58:41 +00:00
Sebastian Husch Lee
a2be90b95a
fix: Update device deserialization for components that use local models (#7686)
* fix: Update device deserializtion for SentenceTransformersTextEmbedder

* Add unit test

* Fix unit test

* Make same change to doc embedder

* Add release notes

* Add same change to Diversity Ranker and Named Entity Extractor

* Add unit test

* Add the same for whisper local

* Update release notes
2024-05-14 08:36:14 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules (#7675)
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Stefano Fiorucci
7c9532b200
fix broken serialization of HFAPI components (#7661) 2024-05-08 17:14:37 +02:00