haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-10 23:04:02 +00:00

Author	SHA1	Message	Date
Sebastian Husch Lee	ce0917e586	feat: Add `raise_on_failure` boolean parameter to `OpenAIDocumentEmbedder` and `AzureOpenAIDocumentEmbedder` (#9474 ) * Add raise_on_failure to OpenAIDocumentEmbedder * Add reno * Add parameter to Azure Doc embedder as well * Fix bug * Update reno * PR comments * update reno	2025-06-03 10:22:34 +00:00
Stefano Fiorucci	2616d4d55b	test: speed up some tests + minor refactorings (#9451 ) * this is an integration test * more improvements * rm redundant comments	2025-05-29 09:49:11 +02:00
David S. Batista	da60156174	chore: removing unused imports from tests (#9446 )	2025-05-26 16:22:51 +00:00
Sebastian Husch Lee	e6a53b9dca	fix: Add missing `timeout` and `max_retries` to `OpenAITextEmbedder` and `OpenAIDocumentEmbedder` (#9421 ) * Add missing params to to_dict for OpenAI embedders * add reno * Track variable internally instead of using client	2025-05-22 09:19:14 +00:00
Jan Trienes	83b087caf4	feat: add `local_files_only` to sentence-transformers embedders (#9400 ) * feat: add to sentence-transformers embedders * add release note * Fix wording Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2025-05-19 16:11:49 +00:00
David S. Batista	f233e06f0a	feat : adding a new `Protocol` for `TextEmbedder` (#9353 ) * initial import * removing unused imports * adding an Embbeder Protocol * adding tests * adding tests * adding release notes * renaming dir * removing dir * cleaning * adding clean tests * dealing eith elipsis and pylint * wip: extending tests * cleaning extended tests * adding an invalid TextEmbedder	2025-05-12 12:35:09 +02:00
Stefano Fiorucci	38c39a49de	test: review integration tests (#9306 ) * AzureOCR: convert integration test to unit test and simplify * clean up HuggingFaceAPITextEmbedder * clean up LinkContentFetcher * simplify HuggingFaceLocalGenerator * clean up OpenAIGenerator * OpenAIChatGenerator * SentenceTransformersDiversityRanker * TransformersSimilarityRanker * ChatMessage: rm outdated tests * fail fast false * typo	2025-04-25 09:07:57 +02:00
Stefano Fiorucci	e3d4e21237	test: mark more tests as slow (#9296 ) * test: mark tests as slow * alphabetical order; install xet * revert pyproject * Trigger Build * simplify tests as suggested * add comment to workflow	2025-04-24 10:25:13 +02:00
Grig Alex	14669419f2	feat: Allow OpenAI client config in other components (#9270 ) * Add http config to generators * Add http config to RemoteWhisperTranscriber * Add http config to embedders * Add notes of http config * disable linter too-many-positional-arguments --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai> Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>	2025-04-22 09:44:55 +00:00
Amna Mubashar	498637788a	feat: Allow OpenAI client config in `OpenAIChatGenerator` and `AzureOpenAIChatGenerator` (#9215 ) * Allow OpenAI client config in chat generator * Add init_http_client as a util method * Update azure chat gen * Fix linting	2025-04-16 18:32:13 +02:00
MetroCat69	f7ac4b35cb	feat: add `run_async` for `HuggingFaceAPIDocumentEmbedder` (#9226 ) * added async support for HuggingFaceAPIDocumentEmbedder * added type anotations, removed unused import * Trigger mark test complited * Apply suggestions from code review * utility function --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2025-04-16 09:54:36 +02:00
David S. Batista	45aa9608b5	removing async test for non-existant model (#9208 )	2025-04-10 12:38:35 +02:00
Arseniy Shkunkov	bac29d9337	feat: add run_async for HuggingFaceAPITextEmbedder (#9204 ) * Initial commit * adding release notes * adding async integrations tests --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2025-04-10 11:44:40 +02:00
Stefano Fiorucci	45cd6f43d6	feat: make `AzureOpenAIDocumentEmbedder` inherit from `OpenAIDocumentEmbedder` - async support (#9189 ) * feat: make AzureOpenAIDocumentEmbedder inherit from OpenAIDocumentEmbedder - async support * fix type * rm unused import * do not replace newlines * fix test	2025-04-08 12:51:45 +02:00
Stefano Fiorucci	6f4e70050f	feat: make `AzureOpenAITexttEmbedder` inherit from `OpenAITextEmbedder` - async support (#9188 ) * draft * updates * relnote	2025-04-08 12:51:34 +02:00
Francesco Nuzzo	c539ffa4c3	feat: add run_async for OpenAITextEmbedder (#9084 ) * feat: add run_async for OpenAITextEmbedder * fix: typing * fix: avoid replacing newlines with spaces. Also fix kwargs "input" field to include prefix and suffix * ci: add release notes * expand release notes; unit tests --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com> Co-authored-by: Michele Pangrazzi <xmikex83@gmail.com> Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2025-04-08 07:20:10 +00:00
Mohammed Abdul Razak Wahab	a2f73d134d	feat(embedders): Add async support for OpenAI document embedder (#9140 ) * feat(embedders): Add async support for OpenAI document embedder * add release notes * resolve review comments * Update releasenotes/notes/openai-document-embedder-async-support-b46f1e84043da366.yaml Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * Update openai-document-embedder-async-support-b46f1e84043da366.yaml --------- Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> Co-authored-by: David S. Batista <dsbatista@gmail.com>	2025-04-04 11:55:59 +00:00
Stefano Fiorucci	d1db061058	test: make Azure embedders correctly run on PRs from forks (#9168 )	2025-04-04 11:16:58 +02:00
Amna Mubashar	dd6ff10d3b	feat: allow OpenAI client config in AzureOpenAI embedders (#9136 ) * Allow OpenAI client config	2025-04-02 16:50:48 +02:00
scara	18367203a8	fix: manage max_retries=0 in AzureOpenAIGenerator, AzureOpenAIChatGenerator, AzureOpenAITextEmbedder, AzureOpenAIDocumentEmbedder (#9128 ) * fix: manage max_retries=0 in AzureOpenAIGenerator and AzureOpenAIChatGenerator * fix: manage max_retries=0 in AzureOpenAITextEmbedder and AzureOpenAIDocumentEmbedder	2025-03-28 13:11:09 +01:00
Stefano Fiorucci	1c1030efc6	chore: make Haystack warnings consistent (#9083 ) * chore: make Haystack warnings consistent * more structured logging * small fixes	2025-03-21 18:18:55 +01:00
Sebastian Husch Lee	4edefe3e56	Feat: Support Azure Workload Identity Credential (#9012 ) * Start adding support for passing callable to Azure components * Add to chat version * Fix test * Add reno * Add support to azure doc and text embedder * Rename * update llm metadata extractor * Add tests for text embedder * Update tests * Remove unused fixture and import * Update reno	2025-03-12 13:45:40 +01:00
Mohammed Abdul Razak Wahab	0d65b4caa7	feat: Enhance error handling in Azure document embedder (#8941 ) * feat: Enhance error handling in Azure document embedder * add release notes * address review comments * Update releasenotes/notes/add-azure-embedder-exception-handler-c10ea46fb536de3b.yaml Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * more alignment with OpenAI impl --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2025-03-04 11:16:08 +01:00
Julian Risch	6652dd7550	Revert "test: skip HF API live integration tests (#8889 )" (#8914 ) * Revert "test: skip HF API live integration tests (#8889)" This reverts commit 56a3a9bd61b7391ae91e3d8179b3b33918ef4932. * Replace zephyr-7b-beta model with SmolLM2-1.7B-Instruct * Use zephyr-7b-beta model but extend instructions --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2025-02-25 09:03:20 +01:00
Stefano Fiorucci	56a3a9bd61	test: skip HF API live integration tests (#8889 ) * skip HF API integration tests * better wording	2025-02-20 16:38:57 +00:00
Ulises M	bfdad40a80	feat: Add ONNX & OpenVINO backend support, and torch dtype kwargs in Sentence Transformers Components (#8813 ) * initial rough draft * expose backend instead of extracting from model_kwargs * explictly set backend model path * add reno * expose backend for ST diversity backend * add dtype tests and expose kwargs to ST ranker for backend parameters * skip dtype tests as torch isnt compiled with cuda * add new openvino dependency release, unskip tests * resolve suggestion * mock calls, turn integrations into unit tests * remove unnecessary test dependencies	2025-02-13 12:04:14 +01:00
György Orosz	d2348ad462	feat: SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder can accept and pass any arguments to SentenceTransformer.encode (#8806 ) * feat: SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder can accept and pass any arguments to SentenceTransformer.encode * refactor: encode_kwargs parameter of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder mae to be the last positional parameter for backward compatibility reasons * docs: added explanation for encode_kwargs in SentenceTransformersTextEmbedder and SentenceTransformersDocumentEmbedder * test: added tests for encode_kwargs in SentenceTransformersTextEmbedder and SentenceTransformersDocumentEmbedder * doc: removed empty lines from docstrings of SentenceTransformersTextEmbedder and SentenceTransformersDocumentEmbedder * refactor: encode_kwargs parameter of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder mae to be the last positional parameter for backward compatibility (part II.)	2025-02-05 16:09:35 +00:00
Stefano Fiorucci	5ae94886b2	fix: fix test failures with Transformers models in PRs from forks (#8809 ) * trigger * try pinning sentence transformers * make integr tests run right away * pin transformers instead * older transformers version * rm transformers pin * try ignoring cache * change ubuntu version * try removing token * try again * more HF_API_TOKEN local deletions * restore test priority * rm leftover * more deletions * moreee * more * deletions * restore jobs order	2025-02-04 19:08:37 +01:00
Stefano Fiorucci	877f826da0	refactor: HF API Embedders - use `InferenceClient.feature_extraction` instead of `InferenceClient.post` (#8794 ) * HF API Embedders: refactoring * rename variables * rm leftovers * rm pin * rm unused import * relnote * warning with truncate/normalize and serverless inference API * test that warnings are raised	2025-02-03 15:11:16 +00:00
Amna Mubashar	db76ae2847	feat: add `default_headers` for Azure embedders (#8699 ) * Add default_headers param to azure embedders	2025-01-12 17:41:38 +01:00
mathislucka	fe9b1e29d4	CI: fix format after newly introduced formatting rules from ruff release (#8696 )	2025-01-09 16:25:55 +00:00
Sebastian Husch Lee	14895f6573	chore: Use token instead of use_auth_token because of deprecation warning (#8552 ) * Use token instead of use_auth_token because of deprecation warning * Fix test * pylint * fix linting --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-11-18 11:58:22 +00:00
Ivo Bellin Salarin	c78545dfc0	feat(openai): be tolerant to exceptions (#8526 ) * feat: be tolerant to exceptions if ever an error is raised by the OpenAI API, don't fail the entire processing * fix: missing import, string separator * Enhance error handling * Use batched from more_itertools for compatibility with older Python versions * Fix batching and add test --------- Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>	2024-11-15 10:52:44 +01:00
Ajit Singh	6cf13e8b98	enhancement: reduced usage of numpy and substituted built-in libraries (#8418 ) * reduced usage of numpy and substituted built-in libraries * added release note * edited expit function to support both float as well as list (this case was giving error CI) * revert code , numpy can't be removed here * more cleaning * fix relnote --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2024-10-18 15:42:19 +02:00
Alper	b40f0c8b5d	feat: SentenceTransformersTextEmbedder supports `config_kwargs` (#8432 ) * add config_kwargs * disable PLR0913 for a specific function * add a release note * refer to AutoConfig in config_kwargs docstring --------- Co-authored-by: David S. Batista <dsbatista@gmail.com> Co-authored-by: Julian Risch <julianrisch@gmx.de>	2024-10-14 16:08:53 +00:00
David S. Batista	b81abc0c85	feat: SentenceTransformersDocumentEmbedder supports `config_kwargs` (#8433 ) * initial import * adding release notes	2024-10-14 17:43:04 +02:00
David S. Batista	97126eb544	fix: changing default model to `gpt-4o-mini` on OpenAI API calls (#8360 ) * chaning default model to gpt-4o-mini * adding release notes * fixing some missed tests * fixing some more missed tests * fixing one last missed test * fixing linting issues * making pylint happy about an end2end test * chaning if test to walruss operator * fixing azure embedder from ada to text-embedding-ada-002	2024-09-17 10:36:42 +02:00
Sebastian Husch Lee	06dd5c2f37	feat (v2): Update so `model_max_length` updates `max_seq_length` for Sentence Transformers (#8334 ) * Update so model_max_length does what is expected * Add release notes * Some fixes * Another test	2024-09-06 11:37:56 +02:00
Nicola Procopio	4c798470b2	added `precision` parameter to sentence transformers embeddings (#8179 ) * added `precision` parameter to sentence transformers embeddings * fixed test * Update haystack/components/embedders/sentence_transformers_document_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update test/components/embedders/test_sentence_transformers_text_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update test/components/embedders/test_sentence_transformers_text_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * fix format * Update sentence_transformers_text_embedder.py --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-08-09 11:38:47 +02:00
Sebastian Husch Lee	c90495c2e8	feat: Add model and tokenizer kwargs to `TransformersSimilarityRanker`, `SentenceTransformersDocumentEmbedder`, `SentenceTransformersTextEmbedder` (#8145 ) * Start adding model and tokenizer kwargs support * Add model and tokenizer kwargs to doc embedder * Some updates and fixes in tests * Fix more tests * Fix tests * Add release note * Fix test * Add from_dict tests	2024-08-02 10:37:10 +02:00
Nicola Procopio	47f4db8698	added truncate_dim to sentence transformers embedder (#8077 ) * added truncate_dim to sentence transformers embedder * Update haystack/components/embedders/sentence_transformers_document_embedder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update releasenotes/notes/release-note-2b603a123cd36214.yaml Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * fixed parameter description * added test for truncation to text embedder * fix format --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2024-07-26 10:39:48 +02:00
Sebastian Husch Lee	c121c86c4c	fix: Fix from_dict methods of components using HF models to work with default values (#8003 ) * Fix from_dict to work if device isn't provided in init params * Minor refactoring of from_dict for components that load HF models * Add tests * Update tests to test loading with all default parameters * Add more tests * Add release notes * Add unit test for whisper local * Update reno * Add fix for ExtractiveReader * Fix NamedEntityExtractor	2024-07-10 12:18:05 +02:00
Nitanshu Vashistha	cd8a5b98fe	feat: Configure max_retries & timeout for AzureOpenAITextEmbedder (#7993 ) max_retries: if not set is read from the OPENAI_MAX_RETRIES env variable or set to 5. timeout: if not set is read from the OPENAI_TIMEOUT env variable or set to 30. Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>	2024-07-09 09:56:46 +02:00
Nitanshu Vashistha	f9d53c5ca8	feat: Configure max_retries and timeout for AzureOpenAIDocumentEmbedder (#7994 ) * feat: Configure max_retries & timeout for AzureOpenAIDocumentEmbedder max_retries: if not set is read from the OPENAI_MAX_RETRIES env variable or set to 5. timeout: if not set is read from the OPENAI_TIMEOUT env variable or set to 30. Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com> * Update retries-and-timeout-for-AzureOpenAIDocumentEmbedder-006fd84204942e43.yaml * Update haystack/components/embedders/azure_document_embedder.py * Update haystack/components/embedders/azure_document_embedder.py --------- Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com> Co-authored-by: David S. Batista <dsbatista@gmail.com>	2024-07-08 22:35:25 +02:00
Vladimir Blagojevic	535a281eec	feat: Add option to use `HF_TOKEN` as env var for authentication across all HF components (#7942 ) * Read both HF_API_TOKEN and HF_TOKEN env vars in all HF related components * Add reno note * Test fixes * More test updates * More test updates	2024-06-27 10:31:58 +02:00
Stefano Fiorucci	75ad76a7ce	chore: remove deprecated TEI embedders (#7907 ) * remove deprecated TEI embedders * rm from the embedders init * rm related tests	2024-06-21 10:36:12 +02:00
Carlos Fernández	686a4999cf	feat: widen support of env vars in OpenAI components (#7653 ) * add enviroment variables to the _enviroment.py file * add support for two of the three variables * Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder. * Replicate support for env vars in OpenAITextEmbedder. * Add support for env vars in OpenAIGenerator.. * Add support for env vars in OpenAIChatGenerator. * add docstrings and reno * add params to __init__ in OpenAIDocumentEmbedder * add params to __init__ in OpenAITextEmbedder * make fully functional implementation of env vars and unit tests * update reno * Update haystack/components/embedders/openai_text_embedder.py * reverse changes to telemetry/_enviroment.py * Update haystack/components/embedders/openai_text_embedder.py --------- Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>	2024-05-15 21:58:41 +00:00
Sebastian Husch Lee	a2be90b95a	fix: Update device deserialization for components that use local models (#7686 ) * fix: Update device deserializtion for SentenceTransformersTextEmbedder * Add unit test * Fix unit test * Make same change to doc embedder * Add release notes * Add same change to Diversity Ranker and Named Entity Extractor * Add unit test * Add the same for whisper local * Update release notes	2024-05-14 08:36:14 +02:00
Massimiliano Pippi	10c675d534	chore: add license header to all modules (#7675 ) * add license header to modules * check license header at linting time	2024-05-09 13:40:36 +00:00
Stefano Fiorucci	7c9532b200	fix broken serialization of HFAPI components (#7661 )	2024-05-08 17:14:37 +02:00

1 2

73 Commits