582 Commits

Author SHA1 Message Date
Stefano Fiorucci
bc8a4754d2
test: use small Sentence Transformers models in tests (#9802)
* test: use small Sentence Transformers models in tests

* fix
2025-09-24 09:26:51 +02:00
Sebastian Husch Lee
143b0b00e8
tests: Add more tests for OpenAIChatGenerator with different response_format options (#9810)
* Fix: only put in response_format into api args if it's not None

* Add reno

* Add more tests

* Update test

* Remove test
2025-09-23 14:51:52 +02:00
tstadel
622f922b98
feat: select tools at runtime (#9798)
* feat: select tools at runtime

* pass tools to ToolInvoker too for consistency

* refactoring

* add reno

* apply feedback and add tools to run_async

* add tests

* fix mypy

* chore: enable tool selection when running from snapshot as well

* fix pylint

* apply feedback

* Update haystack/components/agents/agent.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update releasenotes/notes/add-tools-to-agent-run-params-3aa9c75ee548c38d.yaml

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* add raises

* add more tests

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
2025-09-23 09:07:54 +02:00
Sebastian Husch Lee
7f802656f6
chore: Refactor tool invoker (#9794)
* Refactoring tool invoker

* More refactoring

* More refactoring

* Small fix

* Fix

* max_workers was missing from ToolInvoker.to_dict
2025-09-22 09:39:52 +02:00
Arseniy Shkunkov
1fb76ec7e4
feat: add Sparse Embedders based on Sentence Transformers (#9588)
* Added backend class for SparseEncoder and also SentenceTransformersSparseTextEmbedder

* Added SentenceTransformersSparseDocumentEmbedder

* Created a separate _SentenceTransformersSparseEmbeddingBackendFactory and added tests

* Remove unused parameter

* Wrapped output into SparseEmbedding dataclass + fix tests

* Return correct SparseEmbedding, imports and tests

* fix fmt

* Style changes and fixes

* Added a test for embed function

* Added integration test and fixed some other tests

* Add lint fixes

* Fixed positional arguments

* fix types, simplify and more

* fix

* token fixes

* pydocs, small model in test, cache improvement

* try 3.9 for docs

* better to pin click

* release note

* small fix

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-09-19 14:00:13 +00:00
Sebastian Husch Lee
5bca520a48
fix: Fix MetaFieldGroupingRanker to handle unhashable subgroup_by values like list (#9791)
* Fixes

* Add reno

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-09-16 12:24:08 +02:00
Amna Mubashar
35c1cabb4e
feat: support structured outputs in OpenAIChatGenerator (#9754)
* Add parse for response format

* Update response_format

* Add tests

* Add release notes

* Update checks

* remove instance var

* Add tests for azure

* Add schema test

* Add comments

* Add streaming support

* PR comments

* PR comments

* Add tests

* Fix tests

* Add unit tests

* Update Azure files

* PR comments

* Small fix

* Include message.parsed

* Fix seriliaztion

* Update the async method

* Update release notes

* Loosen tests to prevent failure

* PR comments

* Fix release notes

* Fix error
2025-09-16 11:15:28 +02:00
tstadel
0d09f7b889
feat: add system_prompt to Agent run parameters (#9778)
* enhancement: add system_prompt to Agent run parameters

* add reno

* add test
2025-09-09 18:55:42 +02:00
Abdelrahman Kaseb
34f1a04120
fix: preserve explicit lambda_threshold=0.0 in SentenceTransformersDiversityRanker (#9771)
* fix(rankers): preserve lambda_threshold=0.0 in SentenceTransformersDiversityRanker

* Add tests

* release note

* remove unreachable code, merge tests

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-09-09 09:55:12 +00:00
Stefano Fiorucci
ed8649743d
test: attempt to avoid HF API Embedders errors, fail fast when unavoidable (#9766)
* test: better retry configurations for HF API Embedders integration tests

* shorter delay, test only on Ubunt

* try different settings

* fail fast via timeout
2025-09-05 13:15:47 +02:00
Arya Tayshete
efeb985e52
feat(fetcher): support custom request headers in LinkContentFetcher (#9760)
* feat(fetcher): support custom requests in LinkContentFetcher

* feat(fetcher): support custom request headers in LinkContentFetcher + tests

* undo changes in file
2025-09-04 13:31:15 +00:00
Rigved Telang
b17471207d
feat(websearch): add exclude_subdomains parameter to SerperDevWebSearch (#9729)
* feat: add domain filtering with subdomain exclusion to SerperDevWebSearch

- Introduced `exclude_subdomains` parameter to control whether to include subdomains in search results.
- Implemented `_is_domain_allowed` method to enforce domain filtering based on the new parameter.
- Updated tests to verify functionality of domain filtering and subdomain exclusion.

* Fix error in test

* Remove redundant test for `to_dict` method in `TestSerperDevSearchAPI` class

* Fix linting

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
2025-08-29 12:22:02 +02:00
Sebastian Husch Lee
be52c685cd
refactor: Refactor Agent logic for easier readability (#9726)
* Start refactor

* Update run_async to use the new code

* Slight updates

* Refactoring of tests

* Remove messages from execution context

* Cleanup

* More cleanup

* Formatting

* Fix some typing

* ignore typing issues

* Add reno

* Adding docstrings

* Small changes

* docstrings

* Updates

* Update haystack/components/agents/agent.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* PR comments

* PR comments

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-08-21 12:27:57 +00:00
Sebastian Husch Lee
9fae8e3928
fix: Make output_type optional in MetadataRouter.from_dict for YAML loading (#9724)
* Make output type optional in yaml

* Add reno
2025-08-20 09:33:38 +02:00
David S. Batista
2f7cb9e959
!fix: FileTypeRouter raising FileNotFound in a consistently manner (#9710)
* adding raise_on_failure and warning

* adding release notes

* reverting, adding wrongly removed file

* FileNotFoundError is raised both with and without metadata passed

* reverting to raise_on_failure

* Update releasenotes/notes/fix-filetype-router-inconsistencies-b22a3af00059f953.yaml

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* adding warning and updating tests

* adding warning and updating tests

* updating docstring and warning

* updating release notes

* adding extra output key 'failed' and updating tests

* adding missed test file

* Update haystack/components/routers/file_type_router.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/routers/file_type_router.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update releasenotes/notes/fix-filetype-router-inconsistencies-b22a3af00059f953.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* test fixes

* remove duplicated tests

* updating log message

* Fix multi file converter

* updating release notes

* Update releasenotes/notes/fix-filetype-router-inconsistencies-b22a3af00059f953.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* improve relnote

* fixing typing

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2025-08-19 11:12:14 +02:00
Bohan Qu
919e4930f7
feat: support subclasses of ChatMessage in state schema validation (#9718) 2025-08-18 11:46:01 +02:00
Stefano Fiorucci
d38c32e393
chore: change model and provider for HF API multimodal test (#9715) 2025-08-18 10:00:12 +02:00
Saurabh Lingam
ae6f3bcf7c
fix: fix inconsistent top_k validation in SentenceTransformersDiversityRanker (#9698)
* Fix inconsistent top_k validation in SentenceTransformersDiversityRanker
- change elif to if in run() method to ensure top_k validation always
  runs regardless of whatever top_k comes from init or runtime
- Both scenarios now consistently raise ValueError with descriptive
  message format: 'top_k must be between 1 and X, but got Y'
- Fixes inconsistency where init top_k gave confusing MMR error while
  runtime top_k gave clear validation error

* improvements

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-08-14 17:34:29 +02:00
Sebastian Husch Lee
c7256b2116
feat: Update source_id_meta_field in SentenceWindowRetriever to also accept a list of values (#9699)
* Make source_id_meta_field also work with a list of values

* Fix

* Add reno

* Update docstring

* Add unit test

* Update test

* Adding more tests and simplifying logic

* Simplify
2025-08-13 11:41:59 +02:00
Stefano Fiorucci
8160ea8bfc
feat: ToolInvoker - pass tools in run + general refactoring (#9704)
* draft

* more refactoring

* fixes

* tools in run + tests

* reorganize tests

* refinements

* relnote

* log overridden tools

* more static methods
2025-08-13 10:10:30 +02:00
Abdelrahman Kaseb
b9a34dfebf
Fix: prevent in-place mutation of documents in Document Classifiers and Extractors (#9703)
* modify Documents Classifiers and Extractors to not make in-place changes

* Add e2e test for NER

* Add unit test for NER

* fixes + refinements

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-08-12 15:20:44 +02:00
Sebastian Husch Lee
af9aac2b99
chore!: Update finish reason in output of HuggingFaceAPIChatGenerator to match between stream and non-stream modes (#9686)
* Update finish reason

* Fix unit test

* Add reno

* Update releasenotes/notes/update-finish-reason-hf-api-chat-gen-c700042a079733e8.yaml

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update async as well

* Fix unit test

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
2025-08-11 13:52:16 +02:00
Abdelrahman Kaseb
03d9f0fd74
fix: prevent in-place mutation of documents in Document Embedders (#9693)
* fix: prevent in-place mutation of documents after embeddings by using deepcopy

* Add tests

* use from dataclasses import replace instead of deepcopy

* Address PR comments
2025-08-11 12:21:09 +02:00
Stefano Fiorucci
35e69369dc
feat: add ReasoningContent to ChatMessage (#9696)
* feat: add ReasoningContent to ChatMessage

* more tests

* release note

* Update haystack/dataclasses/chat_message.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
2025-08-11 10:01:31 +02:00
Amna Mubashar
683c935b38
feat: Update MetadataRouter to support ByteStream (#9688)
* Start changes for updating metadata router

* Update the router

* releasenotes/notes/add-bytestream-support-metadata-router-1ee5149745894f42.yaml

* Add release notes

* Fix error

* Update typing

* Update the typing

* PR comments

* Update param name

* Update type name

* Fix typo

* Remove type var

* Update typing

* Update typing

* Add overloads for all filter methods

* Use type ignore

* Remove unused imports

---------

Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2025-08-08 16:21:27 +02:00
Stefano Fiorucci
47508bc1e6
fix: fix OpenAI tests for openai==1.99.3 (#9694)
* fix: fix OpenAI tests for openai==1.99.3

* fix async tests
2025-08-08 06:40:57 +00:00
Abdelrahman Kaseb
5f3c37d287
chore: adopt PEP 585 type hints (#9678)
* chore(lint): enforce and apply PEP 585 type hinting

* Run fmt fixes

* Fix all typing imports using some regex

* Fix all typing written in string in tests

* undo changes in the e2e tests

* make e2e test use list instead of List

* type fixes

* remove type:ignore

* pylint

* Remove typing from Usage example comments

* Remove typing from most of comments

* try to fix e2e tests on comm PRs

* fix

* Add tests typing.List in to adjust test compatiplity
- test/components/agents/test_state_class.py
- test/components/converters/test_output_adapter.py
- test/components/joiners/test_list_joiner.py

* simplify pyproject

* improve relnote

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-08-07 10:23:14 +02:00
Chinmay Bansal
4b9fb20bab
feat: add image support to HuggingFaceAPIChatGenerator (#9680)
* feat(huggingface-api): #9671 add image support to HuggingFaceAPIChatGenerator

* docs: add release notes for image support in HuggingFaceAPIChatGenerator

* Fixed comments on PR: implementation, testing, default value for validation

* refinements

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-08-06 16:35:32 +02:00
Abdelrahman Kaseb
d0de78ec0a
fix: ensure sentence_transformers_similarity score is a float to not np.float (#9665)
* fix: ensure sentence_transformers_similarity score is a float to prevent serialization issues

* solve PR comments
2025-08-04 11:28:05 +02:00
Luca Rolshoven
f72ab7f63f
fix(embeddings): add encoding_format keyword argument when calling OpenAI's client.embeddings.create (#9655)
* fix(embeddings): add `encoding_format` keyword argument when calling OpenAI's `client.embeddings.create`.

* fix mypy

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-07-28 08:26:39 +00:00
Stefano Fiorucci
d059cf2c23
feat: add skip_empty_documents init parameter to DocumentSplitter (#9649)
* feat: add skip_empty_documents init parameter to DocumentSplitter

* improve test

* fix + relnote
2025-07-24 11:26:11 +02:00
David S. Batista
3b9b1ae802
feat: adding debugging breakpoints to Pipeline and Agent (#9611)
* wip: fixing tests

* wip: fixing tests

* wip: fixing tests

* wip: fixing tests

* fixing circular imports

* decoupling resume and initial run() for agent

* adding release notes

* re-raising BreakPointException from pipeline.run()

* fixing imports

* refactor: Refactor suggestions for Pipeline breakpoints (#9614)

* Refactoring

* Start adding debug_path into Breakpoint class

* Fully move debug_path into Breakpoint dataclass

* Simplifications in pipeline run logic

* More simplification

* lint

* More simplification

* Updates

* Rename resume_state to pipeline_snapshot

* PR comments

* Missed renaming of state in a few more places

* feat: Add dataclasses to represent a `PipelineSnapshot` and refactored to use it (#9619)

* Refactor to use dataclasses for PipelineSnapshot and AgentSnapshot

* Fix integration tests

* Mypy

* Fix mypy

* Fix lint

* Refactor AgentSnapshot to only contain needed info

* Fix mypy

* More refactoring

* removing unused import

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* feat: saving include_outputs_from intermediate results to `PipelineState` object (#9629)

* saving intermediate components results in include_outputs_from into the PipelineSnaptshot

* cleaning up

* fixing tests

* fixing tests

* extending tests

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* linting

* moving intermediate results to pipeline state and adding pipeline outputs to state

* moving ordered_component_names and include_outputs_from to PipelineSnapshot

* moving original_input_data to PipelineSnapshot

* simplifying saving the intermediate results

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* linting

* cleaning up

* avoiding creating PipelineSnapshot for every component run

* removing unecessary code

* Update checks in Agent to not unecessarily create AgentSnapshot when not needed.

* Update haystack/components/agents/agent.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/components/agents/agent.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* cleaning up tests

* linting

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2025-07-24 08:54:23 +00:00
Stefano Fiorucci
33e8bd5ef6
chore: update SentenceTransformersEmbeddingBackend.embed type hint to include images (#9643)
* chore: update SentenceTransformersEmbeddingBackend type hint to include images

* fix test

* linting

* simplify
2025-07-23 15:57:43 +02:00
Amna Mubashar
8e792a3d12
fix: update _convert_streaming_chunks_to_chat_message to handle tool calls with empty arguments (#9639)
* Update util function

* Add a new test

* PR comments
2025-07-23 13:28:05 +02:00
JohnKagunda
59403de1f0
feat: added return_embedding attr in in_memory/document_store (#9622)
* feat: added  to init

* feat: added return_embedding in to_dict

* feat: added  return_embedding to filter_documents

* feat: added return_embedding to  bm25_retrieval

* refactor: embedding_retrieval to use return_embedding attribute rather than parameter passed

* docs: added releasenote

* fix: pop from doc_fields instead of changing return_documents attr to none

* fix: made return_embedding an optional field and removed deprecation warning

* fix: give return_embedding a higher priority than self.return_embedding

* feat: changed default behaviour of return_embedding to True

* chore: update tests after InMemory Document store update

* Update releasenotes/notes/update-in-memory-document-store-17f555695caf9d52.yaml

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* chore: update docs

* chore: enhanced clarity and redability of expression

* test: return_embedding is set to false during initialization

* test: overriding  return_embedding inside

* fix: changed the use of self.filter_documents to actual implementation inside `embedding_retrieval`

Signed-off-by: rafaeljohn9 <rafaeljohb@gmail.com>

---------

Signed-off-by: rafaeljohn9 <rafaeljohb@gmail.com>
Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
2025-07-23 10:48:14 +00:00
Sebastian Husch Lee
b9b1652fd4
feat: Add LLMDocumentContentExtractor (#9637)
* Add LLMDocumentContentExtractor

* Remove file

* Remove from slow
2025-07-23 11:16:37 +02:00
Stefano Fiorucci
c9e43c9ca2
feat: add DocumentLengthRouter (#9636) 2025-07-22 14:59:28 +02:00
Stefano Fiorucci
868ea41698
feat: add SentenceTransformersDocumentImageEmbedder (#9635) 2025-07-22 13:10:33 +02:00
Sebastian Husch Lee
f801171191
feat: Add DocumentTypeRouter (#9634)
* Add DocumentTypeRouter

* PR comments

* Turn off isort for one line so pylint will pass
2025-07-22 11:08:24 +00:00
Amna Mubashar
b3971ff574
fix: update the deserialization for tool decorator (#9618)
* Fix linting

* Add tests

* Small fixes

* check for Tool instance

* Remove unnecessary update

* PR comments

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-07-22 12:35:29 +02:00
Stefano Fiorucci
4d75ff42b4
feat: add ImageContent class methods (#9632) 2025-07-22 09:46:47 +02:00
Kane Norman
9420492798
feat: allow mime_type to be guessed for ByteStream (#9573)
* feat(bytestream): add guess_mime_type parameter

* refactor(FileTypeRouter): refactor guess mimetype

* feat(bytestream): add guess_mime_type to util

* style(ruff): add trailing whitespace

* fix: fix type annotation

* test(file_type_router): add test for additional_mimetypes param

* fix(file_type_router): non-existent file behavior

* feat(file_type_router): add release notes

* fix(file_type_router): remove unused logger

* style: fix ruff formatting magic values

* test(bytestream): handle windows/unix mimetype differences

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-07-22 07:43:22 +00:00
Stefano Fiorucci
b9fa70610f
feat: extend ChatPromptBuilder to support string templates (#9631) 2025-07-22 09:36:51 +02:00
Sebastian Husch Lee
7414ef6823
feat: Add image converters (#9628)
* Add image converters

* Fix tests

* Update haystack/components/converters/image/__init__.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-07-21 15:46:14 +00:00
Stefano Fiorucci
6a591bd027
feat: add ImageContent dataclass to include images in ChatMessage + OpenAI support (#9626) 2025-07-21 14:39:31 +02:00
Sebastian Husch Lee
393a1bd293
feat: Update sentence window retriever to add source_id_meta_field, split_id_meta_field , and raise_on_missing_meta_fields (#9610)
* Update sentence window retriever

* Improvements and more tests

* Add new variable raise_on_failure

* Update reno
2025-07-14 13:18:05 +02:00
Stefano Fiorucci
3fb2cef9e3
fix: test HFAPIChatGenerator with a different model (#9607) 2025-07-11 11:39:41 +02:00
Stefano Fiorucci
646eedf26a
chore: reenable HF API Embedders tests + improve HFAPIChatGenerator docstrings (#9589)
* chore: reenable some HF API tests + improve docstrings

* revert deletion
2025-07-04 09:39:43 +02:00
Amna Mubashar
050c987946
chore: remove backward compatibility for State deserialization (#9585)
* remove backward compatability

* Fix linting
2025-07-03 13:20:34 +02:00
Sebastian Husch Lee
85258f0654
fix: Fix types and formatting pipeline test_run.py (#9575)
* Fix types in test_run.py

* Get test_run.py to pass fmt-check

* Add test_run to mypy checks

* Update test folder to pass ruff linting

* Fix merge

* Fix HF tests

* Fix hf test

* Try to fix tests

* Another attempt

* minor fix

* fix SentenceTransformersDiversityRanker

* skip integrations tests due to model unavailable on HF inference

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-07-03 09:49:09 +02:00