haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-11-16 01:54:35 +00:00

Author	SHA1	Message	Date
Sebastian Husch Lee	81c0cefa41	refactor: Refactor hf api chat generator (#9449 ) * Refactor HFAPI Chat Generator * Add component info to generators * Fix type hint * Add reno * Fix unit tests * Remove incorrect dev comment * Move _convert_streaming_chunks_to_chat_message to utils file	2025-05-27 15:55:06 +02:00
atopx	3deaa20cb6	feat: Add HuggingFace API (text-embeddings-inference for rerank model) for component.rankers (#9414 ) * feat(component.rankers): Add HuggingFace API (text-embeddings-inference for rerank) ranker component * update test flow & doc loaders * Support run_async for HuggingFaceAPIRanker * Add release note for HuggingFace API support in component.rankers * Add release note for HuggingFace API support in component.rankers * Add release note for HuggingFace API support in component.rankers * Add release note for HuggingFace API support in component.rankers * fix: 1. `hugging_face_api.HuggingFaceAPIRanker` rename to `hugging_face_tei.HuggingFaceAPIRanker` 2. HuggingFaceAPIRanker: use our Secret API for token 3. add the missing modules for `docs/pydoc/config/rankers_api.yml` 4. added function `async_request_with_retry` for `haystack/utils/requests_utils.py` and added unittest on `test/utils/test_requests_utils.py` 4. HuggingFaceAPIRanker: refactor the retry function to support configuration based on attempts and status code. 5. HuggingFaceAPIRanker: refactor the test into unit tests using mocks * fix(HuggingFaceTEIRanker): change the token check logic to use the resolve_value method. * fix(format): run `hatch run format` * fix: - Force keyword-only arguments in __init__ method by adding , - Clarify token docstring that it's not always required - Copy documents to avoid modifying original objects - Remove test file from slow workflow - Add monkeypatch eånvironment variable cleanup in tests - Fix missing module in rankers_api.yml and sort modules alphabetically - Remove unnecessary test info from release notes fix HuggingFaceTEIRanker： - "None" of "Optional[Secret]" has no attribute "resolve_value" - run/run_async: too many parameters * fix(HuggingFaceTEIRanker) :Revise the docstring of the HuggingFaceTEIRanker, improve the parameter descriptions, ensure consistency and clarity. Add error handling information to enhance the readability of the API response. * fix：unit test for HuggingFaceTEIRanker raise message * fix fmt * minor refinements * refine release note --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com>	2025-05-27 12:44:54 +02:00
Sebastian Husch Lee	db3d95b12a	refactor: Refactor openai generator (#9445 ) * Refactor openai generator and chat generator to reusue same util methods * Start fixing tests * More fixes * Fix mypy * Fix	2025-05-27 12:44:17 +02:00
Amna Mubashar	64def6d41b	feat: add component name and type to `StreamingChunk` (#9426 ) * Stream component name in openai * Fix type * PR comments * Update huggingface gen * Typing fix * Update huggingfacelocal gen * Fix errors * Remove model changes * Fix minor errors * Update releasenotes/notes/add-component-info-dataclass-be115dee2fa50abd.yaml Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * PR comments * update annotation * Update hf files * Fix linting * Add a from_component method * use add_component --------- Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>	2025-05-27 12:23:40 +02:00
Stefano Fiorucci	085c3add41	ci: prevent DocumentWriter tests from blocking CI (#9448 )	2025-05-27 12:10:21 +02:00
Stefano Fiorucci	d8487c4d8d	chore: make mypy run with `--check-untyped-defs`; fix some errors (#9447 ) * chore: make mypy run with --check-untyped-defs; fix some errors * small fixes * use HfPipeline * fix license error	2025-05-27 07:35:25 +00:00
David S. Batista	da60156174	chore: removing unused imports from tests (#9446 )	2025-05-26 16:22:51 +00:00
David S. Batista	2092bedb90	chore: removing unused imports from tests (#9444 )	2025-05-26 13:41:36 +00:00
David S. Batista	c82a3377f2	chore: cleaning up tests (#9443 )	2025-05-26 15:12:19 +02:00
Seth Peters	f025501792	fix: `LLMMetadataExtractor` bug in handling `Document` objects with no content * test(extractors): Add unit test for LLMMetadataExtractor with no content Adds a new unit test `test_run_with_document_content_none` to `TestLLMMetadataExtractor`. This test verifies that `LLMMetadataExtractor` correctly handles documents where `document.content` is None or an empty string. It ensures that: - Such documents are added to the `failed_documents` list. - The correct error message ("Document has no content, skipping LLM call.") is present in their metadata. - No actual LLM call is attempted for these documents. This test provides coverage for the fix that prevents an AttributeError when processing documents with no content. * chore: update comment to reflect new behavior in _run_on_thread method * docs: Add release note for LLMMetadataExtractor no content fix * Update releasenotes/notes/fix-llm-metadata-extractor-no-content-910067ea72094f18.yaml * Update fix-llm-metadata-extractor-no-content-910067ea72094f18.yaml --------- Co-authored-by: David S. Batista <dsbatista@gmail.com>	2025-05-23 18:57:39 +02:00
Amna Mubashar	720cc19d7d	feat: add serialization to `State` / move `State` to agents.state (#9345 ) * Add serialization to State * Add release notes * Deprecate State in dataclasses * Fix tests * Remove state_utils test * Fix linting * Fix formating * Update tests and remove old state utils * Update agents test * Update deserilaization per review * Linting * Add tests for edge case (custom class types) * Fix type serialization * PR comments * Move State to agents * Fix tests * Update utils init * Improve seriliaztion/deser * Update the release notes * Minor fix in docstrings * PR comments * Add deprecation warnign for state utils * Recreate the serialization methods to use schema * Update key names * Make serialization methods private	2025-05-23 11:04:15 +02:00
David S. Batista	ba41696bba	chore: removing unused fixtures in test functions	2025-05-23 09:43:01 +02:00
Vladimir Blagojevic	167229f328	feat: Extend AnswerBuilder for Agent (#9406 ) * Extend AnswerBuilder for Agent * Update tests * Add reno note * PR feedback * Add a better unit test * Update haystack/components/builders/answer_builder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update haystack/components/builders/answer_builder.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * PR feedback * Remove copy --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2025-05-22 14:32:36 +02:00
Sebastian Husch Lee	e6a53b9dca	fix: Add missing `timeout` and `max_retries` to `OpenAITextEmbedder` and `OpenAIDocumentEmbedder` (#9421 ) * Add missing params to to_dict for OpenAI embedders * add reno * Track variable internally instead of using client	2025-05-22 09:19:14 +00:00
Stefano Fiorucci	17432f710d	feat: introduce `SentenceTransformersSimilarityRanker` (#9415 ) * new component + tests * soft deprecation of TransformersSimilarityRanker + reno * add comp files to slow workflow * Apply suggestions from code review Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * self.model -> self._cross_encoder * recommend installing sentence-transformers>=4.1.0 --------- Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>	2025-05-21 10:52:46 +02:00
Amna Mubashar	995fa18607	feat: stream `ToolResult` from run_async in Agent (#9407 ) * Add async run * Add release notes * Update the run async * Fixes * Fix linting * Add tests * Fix tests * Remove changes from Tool * Linting updates * Update haystack/components/tools/tool_invoker.py Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com> * Updates tests based on comments * Update release notes --------- Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>	2025-05-21 10:22:38 +02:00
Jan Trienes	83b087caf4	feat: add `local_files_only` to sentence-transformers embedders (#9400 ) * feat: add to sentence-transformers embedders * add release note * Fix wording Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2025-05-19 16:11:49 +00:00
Sebastian Husch Lee	707573d967	feat: Streamline using `Agent` as a `ComponentTool` (#9388 ) * Make agent as a tool more streamlined * Add reno * fix mypy	2025-05-16 13:11:43 +02:00
Sebastian Husch Lee	af073852d0	feat: Add `usage` when using `HuggingFaceAPIChatGenerator` with streaming (#9371 ) * Small fix and update tests * Add usage support to streaming for HuggingFaceAPIChatGenerator * Add reno * try using provider='auto' * Undo provider * Fix unit tests * Update releasenotes/notes/add-usage-hf-api-chat-streaming-91fd04705f45d5b3.yaml Co-authored-by: Julian Risch <julian.risch@deepset.ai> --------- Co-authored-by: anakin87 <stefanofiorucci@gmail.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2025-05-15 13:09:36 +02:00
Sebastian Husch Lee	9ae76e1653	Fix component tool parameters (#9342 ) * Starting property schema refactor * Adding more tests * More tests * Handle null type explicitly * More updates of tests to accomodate Optional properly * Fix more tests * Remove unecessary check * Some cleanup * Update test * Add reno * Fix typing * Add license header * Use docstrings of dataclasses in parameter spec generation * More tests of Haystack dataclass types * Properly handle Sequence * Fix license header * Update OpenAI tests to add more complicated tool parameter signature * Properly set required for dataclasses * Add integration test for azure that includes additionalProperties * Add more complicated integration test for HuggingFaceAPIChatGenerator * Alternate approach using pydantic like we do in from_function.py * Cleanup and fix other affected tests * Fix mypy * PR comments * PR comment * Remove test from HF API * Update reno * Update reno	2025-05-15 07:51:06 +00:00
David S. Batista	42b378950f	fix: `DocumentRecallEvaluator` changing division and adding checks for emptiness of documents (#9380 ) * changing division and adding checks for emptiness of documents * adding release notes * adding tests * Update releasenotes/notes/updated-doc-recall-eval-uniqueness-59b09082cf8e7593.yaml Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * attending PR comments * Update releasenotes/notes/updated-doc-recall-eval-uniqueness-59b09082cf8e7593.yaml * Update releasenotes/notes/updated-doc-recall-eval-uniqueness-59b09082cf8e7593.yaml Co-authored-by: Julian Risch <julian.risch@deepset.ai> * Update haystack/components/evaluators/document_recall.py Co-authored-by: Julian Risch <julian.risch@deepset.ai> * Update haystack/components/evaluators/document_recall.py Co-authored-by: Julian Risch <julian.risch@deepset.ai> * Update haystack/components/evaluators/document_recall.py Co-authored-by: Julian Risch <julian.risch@deepset.ai> * Update haystack/components/evaluators/document_recall.py Co-authored-by: Julian Risch <julian.risch@deepset.ai> * adding tests * linting --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> Co-authored-by: Julian Risch <julian.risch@deepset.ai>	2025-05-14 11:37:47 +02:00
Sebastian Husch Lee	9f2c0679d4	Small fix and update tests (#9370 )	2025-05-12 22:02:26 +02:00
David S. Batista	f233e06f0a	feat : adding a new `Protocol` for `TextEmbedder` (#9353 ) * initial import * removing unused imports * adding an Embbeder Protocol * adding tests * adding tests * adding release notes * renaming dir * removing dir * cleaning * adding clean tests * dealing eith elipsis and pylint * wip: extending tests * cleaning extended tests * adding an invalid TextEmbedder	2025-05-12 12:35:09 +02:00
Stefano Fiorucci	4b4b0f0041	fix: `HuggingFaceAPIChatGenerator` - make tool conversion compatible with `huggingface_hub>=0.31.0` (#9354 ) * fix: HuggingFaceAPIChatGenerator - make tool conversion compatible with huggingface_hub>=0.31.0 * relnote	2025-05-07 18:37:05 +02:00
Amna Mubashar	64f384b52d	feat: enable streaming ToolCall/Result from Agent (#9290 ) * Testing solutions for streaming * Remove unused methods * Add fixes * Update docstrings * add release notes and test * PR comments * add a new util function * Adjust emit_tool_info * PR comments * Remove emit function, add streaming for tool_call --------- Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>	2025-05-05 16:23:44 +02:00
David S. Batista	0f00c1882e	fix: make `SentenceSplitter` QUOTE_SPANS_RE regex ReDoS-safe (#9338 ) * fix: make QUOTE_SPANS_RE regex ReDoS-safe * Removing the capture of leading non-character on double quotes, allowing quote with new lines, adding tests * cleaning * fixing release notes * changing import * adding test for Regex Denial of Service (ReDoS) * reducing the size/time of tests * Update test/components/preprocessors/test_sentence_tokenizer.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * Update test/components/preprocessors/test_sentence_tokenizer.py --------- Co-authored-by: Waivey <waivey@proton.me> Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2025-05-02 15:40:17 +00:00
Stefano Fiorucci	e3f9da13d0	test: fix test incorrectly marked as async (#9327 ) * test: fix test incorrectly marked as async * fix inmemory async tests	2025-04-30 14:07:30 +00:00
David S. Batista	201becd400	fix: `RecursiveSplitter` bug in the case when the recursive chunking is triggered (#9316 ) * initial import * adding release notes * Update fixing-bug-recursive-splitter-88d5714529f84e4e.yaml	2025-04-30 13:03:23 +02:00
Yassin Nouh	ed6176a8cb	fix: make `HuggingFaceAPIChatGenerator` convert Tool Call `arguments` from string (#9303 ) * fix: sort imports in hugging_face_api.py * fix: import logging in hugging_face_api.py * fix: refactor HuggingFace API tool call handling for improved argument conversion * Update haystack/components/generators/chat/hugging_face_api.py Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com> * refinements + tests + relnote * simplify --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2025-04-28 15:36:19 +02:00
Mohammed Abdul Razak Wahab	53308a6294	feat: Add sanitization for Meta field during serialization (#9272 ) * feat: Add sanitization for Meta field during serialization * Revert "feat: Add sanitization for Meta field during serialization" This reverts commit c529f7c25b69aed626bb2072c8bf171815b591cc. * feat: add nested serialization in openai usage object * add reno * add nested serialization in OpenAiChatGenerator * Update releasenotes/notes/nested-serialization-openai-usage-object-3817b07342999edf.yaml Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> * merge tests * Adjust the test --------- Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com> Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2025-04-26 15:04:02 +05:00
Sebastian Husch Lee	0fdb88424b	fix: Fix Azure test on forks (#9312 ) * Fix unit test * Fix test	2025-04-25 11:10:59 +02:00
Stefano Fiorucci	38c39a49de	test: review integration tests (#9306 ) * AzureOCR: convert integration test to unit test and simplify * clean up HuggingFaceAPITextEmbedder * clean up LinkContentFetcher * simplify HuggingFaceLocalGenerator * clean up OpenAIGenerator * OpenAIChatGenerator * SentenceTransformersDiversityRanker * TransformersSimilarityRanker * ChatMessage: rm outdated tests * fail fast false * typo	2025-04-25 09:07:57 +02:00
Mohammed Abdul Razak Wahab	f97472329f	feat: Add support for multiple outputs in ConditionalRouter (#9271 ) * feat: Add support for multiple outputs in ConditionalRouter * Update haystack/components/routers/conditional_router.py Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com> * add additional route --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2025-04-24 16:17:06 +02:00
Michele Pangrazzi	4a908d075e	Fix OpenAIGenerator and OpenAIChatGenerator to allow wrapped streaming objects usage (#9304 ) * Fix for handling wrapped ChatCompletion instances in streaming (used by tools like weave) * Add release note * Applied same fix to OpenAIGenerator ; Refactoring ; Update release note * Fix integration test error after refactoring	2025-04-24 16:16:41 +02:00
Stefano Fiorucci	e3d4e21237	test: mark more tests as slow (#9296 ) * test: mark tests as slow * alphabetical order; install xet * revert pyproject * Trigger Build * simplify tests as suggested * add comment to workflow	2025-04-24 10:25:13 +02:00
Stefano Fiorucci	df662daaef	test: improve some slow tests (#9297 ) * test: improve slow tests * rm leftover and improve test	2025-04-24 08:50:36 +02:00
Stefano Fiorucci	9ae7da8df3	test: workflow for slow/unstable integration tests (#9267 ) * workflow for slow integration tests * try changing skipper * Trigger Build * better names * fix * mv tika to slow * try skipping slow workflow * retry paths-ignore * remove skipper * Revert "remove skipper" This reverts commit 302ed2f07f36b33fa61fde0843b5590d79b98d74. * better skipper * retry * Revert "retry" This reverts commit fe5dff68f496645cc45292d74fcd8d043e868392. * try using one workflow * trigger * try to see if it fails * cosmetic changes * improvements * try matrix * retry * fix * clean up * simplify datadog monitoring and trigger * send event to datadog for nightly failures * tests should run if: manual trigger, scheduled, PR has label, release branch, or relevant files changed * clarify slow marker * improve comments * labels	2025-04-23 10:36:44 +02:00
Mohammed Abdul Razak Wahab	ddd7318ae8	fix: use coerce_tag_value in LoggingTracer to serialize tag values (#9251 ) * fix: use coerce_tag_value in LoggingTracer to serialize tag values * add rn * fix tests --------- Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>	2025-04-22 16:18:24 +02:00
Sebastian Husch Lee	d5ae46bc93	feat: Add Toolset to Agent (#9284 ) * Add Toolset to Agent * Add reno	2025-04-22 14:08:34 +02:00
Grig Alex	14669419f2	feat: Allow OpenAI client config in other components (#9270 ) * Add http config to generators * Add http config to RemoteWhisperTranscriber * Add http config to embedders * Add notes of http config * disable linter too-many-positional-arguments --------- Co-authored-by: Julian Risch <julian.risch@deepset.ai> Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>	2025-04-22 09:44:55 +00:00
Sebastian Husch Lee	114b4568ba	Fix state_schema serialization for agent tracing (#9278 )	2025-04-22 09:39:41 +02:00
Sebastian Husch Lee	0f374e0563	Fix from_dict and update test (#9277 )	2025-04-22 06:59:03 +00:00
Sebastian Husch Lee	19cf220136	feat: integrate two ready-made SuperComponents from haystack-experimental (#9235 ) * Add super component decorator * Add reno * MultiFileConverter * Add DocumentPreprocessor * Add reno * Add tests and change doc preprocessor to split first then clean * Remove code from merge * Add to pydoc and missing test file * PR comments * Lint fix * Fix mypy * Fix mypy * Add comment * PR comments * Update haystack/components/converters/multi_file_converter.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/preprocessors/document_preprocessor.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * Update haystack/components/converters/multi_file_converter.py Co-authored-by: Daria Fokina <daria.fokina@deepset.ai> * PR comments * PR comment --------- Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>	2025-04-17 10:02:26 +00:00
Amna Mubashar	498637788a	feat: Allow OpenAI client config in `OpenAIChatGenerator` and `AzureOpenAIChatGenerator` (#9215 ) * Allow OpenAI client config in chat generator * Add init_http_client as a util method * Update azure chat gen * Fix linting	2025-04-16 18:32:13 +02:00
Sebastian Husch Lee	cdc53cae78	fix: Add `batch_size` to `to_dict` of TransformersSimilarityRanker (#9248 ) * Add missing batch_size to to_dict of similarity ranker * Add reno	2025-04-16 12:16:59 +02:00
MetroCat69	f7ac4b35cb	feat: add `run_async` for `HuggingFaceAPIDocumentEmbedder` (#9226 ) * added async support for HuggingFaceAPIDocumentEmbedder * added type anotations, removed unused import * Trigger mark test complited * Apply suggestions from code review * utility function --------- Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>	2025-04-16 09:54:36 +02:00
Sebastian Husch Lee	f46bf14851	fix: Allow Agent to run with no tools (#9230 ) * Fix * Add reno * Add test * Update docstring and warning message * Update docstring	2025-04-16 07:53:21 +02:00
Sebastian Husch Lee	185e1c79c9	feat: Agent tracing (#9240 ) * Agent tracing * Small changes * Some changes and refactoring * Refactoring to reuse code * Fix * Add reno * Fix tests * Fix tests * Fix linting * Refactor and add tracing support to run_async of Agent * Reduce duplicate code * Remove finalize_run * Use break instead of copying code three times * Adding a test * Add tracing unit tests * Make async tracing test actually run async * Increase test coverage * Unit test for traces in pipeline * Add cleanup * Fix proper indentation * PR comments * PR comments and new test * Update warning message * Update warning message --------- Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>	2025-04-15 15:58:26 +02:00
Stefano Fiorucci	656fe6dc6e	chore: LLM Evaluators - remove deprecated parameters (#9219 )	2025-04-15 09:26:31 +02:00
David S. Batista	d860a73ddb	chore: cleaning duplicated import (#9234 )	2025-04-15 09:25:19 +02:00

1 2 3 4 5 ...

507 Commits