300 Commits

Author SHA1 Message Date
Stefano Fiorucci
af75774c3f
chore: rename docs directory and other adjustments (#10157)
* chore: rename docs directory and other adjustments

* fixes
2025-11-28 08:41:35 +01:00
Stefano Fiorucci
8bdcd34610
chore: remove CI scripts and configs for Readme API (#10147) 2025-11-27 13:05:55 +01:00
Daria Fokina
9611fb3590
migrating to Haystack guide (#10121) 2025-11-24 15:26:05 +01:00
Stefano Fiorucci
b096431aff
test: pin transformers<4.57 (#9852) 2025-10-05 11:14:46 +02:00
Stefano Fiorucci
707e6837b6
fix: pin openai>=1.99.2 (#9812) 2025-09-24 07:51:21 +00:00
Arseniy Shkunkov
1fb76ec7e4
feat: add Sparse Embedders based on Sentence Transformers (#9588)
* Added backend class for SparseEncoder and also SentenceTransformersSparseTextEmbedder

* Added SentenceTransformersSparseDocumentEmbedder

* Created a separate _SentenceTransformersSparseEmbeddingBackendFactory and added tests

* Remove unused parameter

* Wrapped output into SparseEmbedding dataclass + fix tests

* Return correct SparseEmbedding, imports and tests

* fix fmt

* Style changes and fixes

* Added a test for embed function

* Added integration test and fixed some other tests

* Add lint fixes

* Fixed positional arguments

* fix types, simplify and more

* fix

* token fixes

* pydocs, small model in test, cache improvement

* try 3.9 for docs

* better to pin click

* release note

* small fix

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-09-19 14:00:13 +00:00
Sebastian Husch Lee
68168c45c9
chore: Bump transformers (#9740)
* Bump transformers

* Fix typing issue

* Pin transformers to less than 5
2025-09-02 13:37:56 +02:00
Abdelrahman Kaseb
b9a34dfebf
Fix: prevent in-place mutation of documents in Document Classifiers and Extractors (#9703)
* modify Documents Classifiers and Extractors to not make in-place changes

* Add e2e test for NER

* Add unit test for NER

* fixes + refinements

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-08-12 15:20:44 +02:00
Stefano Fiorucci
c05d3f0051
chore: remove unused type:ignore and cast (#9690)
* chore: remove unused type:ignore and casts

* rm unused imports
2025-08-07 15:41:00 +02:00
Abdelrahman Kaseb
5f3c37d287
chore: adopt PEP 585 type hints (#9678)
* chore(lint): enforce and apply PEP 585 type hinting

* Run fmt fixes

* Fix all typing imports using some regex

* Fix all typing written in string in tests

* undo changes in the e2e tests

* make e2e test use list instead of List

* type fixes

* remove type:ignore

* pylint

* Remove typing from Usage example comments

* Remove typing from most of comments

* try to fix e2e tests on comm PRs

* fix

* Add tests typing.List in to adjust test compatiplity
- test/components/agents/test_state_class.py
- test/components/converters/test_output_adapter.py
- test/components/joiners/test_list_joiner.py

* simplify pyproject

* improve relnote

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-08-07 10:23:14 +02:00
Stefano Fiorucci
ed48e9e965
test: fix Datadog tests and unpin ddtrace test dependency (#9659)
* test: fix Datadog tests and unpin ddtrace test dependency

* we need less env vars
2025-07-30 17:20:15 +02:00
Stefano Fiorucci
1d96e6e4af
fix: ChatMessage.from_user - raise error if text and content_parts are None; pin ddtrace (#9657)
* fix: allow empty text in ChatMessage.from_user

* pin ddtrace<3.11.0
2025-07-29 12:39:27 +02:00
David S. Batista
3b9b1ae802
feat: adding debugging breakpoints to Pipeline and Agent (#9611)
* wip: fixing tests

* wip: fixing tests

* wip: fixing tests

* wip: fixing tests

* fixing circular imports

* decoupling resume and initial run() for agent

* adding release notes

* re-raising BreakPointException from pipeline.run()

* fixing imports

* refactor: Refactor suggestions for Pipeline breakpoints (#9614)

* Refactoring

* Start adding debug_path into Breakpoint class

* Fully move debug_path into Breakpoint dataclass

* Simplifications in pipeline run logic

* More simplification

* lint

* More simplification

* Updates

* Rename resume_state to pipeline_snapshot

* PR comments

* Missed renaming of state in a few more places

* feat: Add dataclasses to represent a `PipelineSnapshot` and refactored to use it (#9619)

* Refactor to use dataclasses for PipelineSnapshot and AgentSnapshot

* Fix integration tests

* Mypy

* Fix mypy

* Fix lint

* Refactor AgentSnapshot to only contain needed info

* Fix mypy

* More refactoring

* removing unused import

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* feat: saving include_outputs_from intermediate results to `PipelineState` object (#9629)

* saving intermediate components results in include_outputs_from into the PipelineSnaptshot

* cleaning up

* fixing tests

* fixing tests

* extending tests

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* linting

* moving intermediate results to pipeline state and adding pipeline outputs to state

* moving ordered_component_names and include_outputs_from to PipelineSnapshot

* moving original_input_data to PipelineSnapshot

* simplifying saving the intermediate results

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/dataclasses/breakpoints.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* linting

* cleaning up

* avoiding creating PipelineSnapshot for every component run

* removing unecessary code

* Update checks in Agent to not unecessarily create AgentSnapshot when not needed.

* Update haystack/components/agents/agent.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* Update haystack/components/agents/agent.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* cleaning up tests

* linting

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2025-07-24 08:54:23 +00:00
Sebastian Husch Lee
7414ef6823
feat: Add image converters (#9628)
* Add image converters

* Fix tests

* Update haystack/components/converters/image/__init__.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-07-21 15:46:14 +00:00
Stefano Fiorucci
6a591bd027
feat: add ImageContent dataclass to include images in ChatMessage + OpenAI support (#9626) 2025-07-21 14:39:31 +02:00
Stefano Fiorucci
646eedf26a
chore: reenable HF API Embedders tests + improve HFAPIChatGenerator docstrings (#9589)
* chore: reenable some HF API tests + improve docstrings

* revert deletion
2025-07-04 09:39:43 +02:00
Sebastian Husch Lee
85258f0654
fix: Fix types and formatting pipeline test_run.py (#9575)
* Fix types in test_run.py

* Get test_run.py to pass fmt-check

* Add test_run to mypy checks

* Update test folder to pass ruff linting

* Fix merge

* Fix HF tests

* Fix hf test

* Try to fix tests

* Another attempt

* minor fix

* fix SentenceTransformersDiversityRanker

* skip integrations tests due to model unavailable on HF inference

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-07-03 09:49:09 +02:00
Stefano Fiorucci
c18f81283c
chore: fix deepset_sync.py for pylint + general linting improvements (#9558)
* chore: fix deepset_sync.py for pylint

* check .github with ruff

* fix

* Update .github/utils/pyproject_to_requirements.py

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
2025-06-27 07:54:22 +00:00
Michele Pangrazzi
3207a76d50
chore: Update pydoc-markdown.sh (#9547)
* Make config path a $1 param ; Add usage in comment ; Add echo log

* Update sync command
2025-06-24 14:01:51 +02:00
Stefano Fiorucci
556dcc9e46
chore: update transformers test dependency (#9537) 2025-06-23 10:26:11 +02:00
Amna Mubashar
67a8f1249b
chore: update linter configuration for compatibility with latest ruff release (#9528)
* Fix linting

* Fix linting

* Update error suppression

* Update pre commit

* Update pyproject.toml
2025-06-18 09:53:19 +02:00
Stefano Fiorucci
7570f6b769
fix: re-export symbols in __init__.py files (#9521)
* chore: re-export symbols in __init__.py files

* release note
2025-06-16 16:29:08 +02:00
Stefano Fiorucci
f8155e1b77
chore: clean up (#9504) 2025-06-11 11:05:05 +02:00
Stefano Fiorucci
12665ade14
chore: simplify Haystack Hatch scripts (#9491)
* try unifying hatch scripts

* formatting

* simplify

* improve contributing guidelines

* fmt-check
2025-06-06 10:43:02 +02:00
Vladimir Blagojevic
b69d261280
chore: Make docstring-parser core dep (#9477)
* Make docstring-parser core dep

* Add reno note
2025-06-05 11:28:18 +02:00
Stefano Fiorucci
d8487c4d8d
chore: make mypy run with --check-untyped-defs; fix some errors (#9447)
* chore: make mypy run with --check-untyped-defs; fix some errors

* small fixes

* use HfPipeline

* fix license error
2025-05-27 07:35:25 +00:00
Denis Washington
eefda0452d
chore: Make the Haystack core "type complete" (#9438)
* chore: Make the Haystack core "type complete"

For libraries with a `py.typed` marker, it is [recommended][1] to
make all public interfaces "type complete", i.e. to explicitly
annotate all function parameters and return types. Doing so has the
following benefits:

- It maximizes the type information available to users and IDEs.
- It ensures that the argument and return types are the intended ones.
- It sidesteps differences in type inference between the different
  type checker implementations.

This change makes a first step towards type completeness by enabling
the Mypy `disallow_incomplete_defs` for the core modules (excluding
`haystack.components.*` and `haystack.testing.*`) and fixing the
resulting errors.

[1]: https://typing.python.org/en/latest/guides/libraries.html#how-much-of-my-library-needs-types

* chore: Add `python_version = 3.9` to Mypy config

This catches type constructs that are only supported in later Python
versions.

* Remove unused import

* try to fix linting

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-05-26 11:00:22 +02:00
Stefano Fiorucci
17432f710d
feat: introduce SentenceTransformersSimilarityRanker (#9415)
* new component + tests

* soft deprecation of TransformersSimilarityRanker + reno

* add comp files to slow workflow

* Apply suggestions from code review

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* self.model -> self._cross_encoder

* recommend installing sentence-transformers>=4.1.0

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
2025-05-21 10:52:46 +02:00
Stefano Fiorucci
9ae7da8df3
test: workflow for slow/unstable integration tests (#9267)
* workflow for slow integration tests

* try changing skipper

* Trigger Build

* better names

* fix

* mv tika to slow

* try skipping slow workflow

* retry paths-ignore

* remove skipper

* Revert "remove skipper"

This reverts commit 302ed2f07f36b33fa61fde0843b5590d79b98d74.

* better skipper

* retry

* Revert "retry"

This reverts commit fe5dff68f496645cc45292d74fcd8d043e868392.

* try using one workflow

* trigger

* try to see if it fails

* cosmetic changes

* improvements

* try matrix

* retry

* fix

* clean up

* simplify datadog monitoring and trigger

* send event to datadog for nightly failures

* tests should run if: manual trigger, scheduled, PR has label, release branch, or relevant files changed

* clarify slow marker

* improve comments

* labels
2025-04-23 10:36:44 +02:00
Stefano Fiorucci
c5a0bf9eaf
clean pyproject (#9214) 2025-04-11 15:57:56 +02:00
Stefano Fiorucci
8bf41a8510
test: create e2e environment; stop testing spacy in unit tests (#9212)
* ci: create e2e environment; stop testing spacy in unit tests

* try fix

* fix yml

* exclude test python files

* self-referential environment

* do not use self-referential environment
2025-04-11 10:28:53 +00:00
Sebastian Husch Lee
0d6a392506
chore: Bump transformers (#9178)
* Bump transformers

* Bump to patched version

* change version
2025-04-09 08:42:17 +02:00
Stefano Fiorucci
3a6e98565e
ci: pin blis for python 3.9 (#9158) 2025-04-02 15:32:11 +02:00
Julian Risch
f687d49fec
feat: Add option to split by number of tokens to RecursiveDocumentSplitter (#9143)
* add token split_unit

* fix overlap with fallback

* reno

* mark as integration tests

* use type ignore instead of assert

* Update releasenotes/notes/recursive-splitter-token-df56428887ac45bd.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-04-01 09:48:59 +02:00
Vladimir Blagojevic
13941d8bd9
feat: LinkContentFetcher - replace requests with httpx, add async and http/2 (#9034)
* LinkContentFetcher - replace requests with httpx, add async and http/2

* Update haystack/components/fetchers/link_content.py

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* Update haystack/components/fetchers/link_content.py

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* PR feedback

* Merge sync and async

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-03-26 14:55:08 +01:00
Stefano Fiorucci
c5cde40d3a
unpin ruff and update code (#9040) 2025-03-14 14:53:25 +00:00
Sebastian Husch Lee
3d7d65a260
Pin ruff (#9038) 2025-03-14 12:00:21 +01:00
Sebastian Husch Lee
4edefe3e56
Feat: Support Azure Workload Identity Credential (#9012)
* Start adding support for passing callable to Azure components

* Add to chat version

* Fix test

* Add reno

* Add support to azure doc and text embedder

* Rename

* update llm metadata extractor

* Add tests for text embedder

* Update tests

* Remove unused fixture and import

* Update reno
2025-03-12 13:45:40 +01:00
Stefano Fiorucci
c04c900f26
build: drop Python 3.8 support (#8978)
* draft

* readd typing_extensions

* small fix + release note

* remove ruff target-version

* Update releasenotes/notes/drop-python-3.8-868710963e794c83.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-03-05 14:59:56 +00:00
Stefano Fiorucci
ec97f4d991
update transformers test dependency to 4.48.3 (#8979) 2025-03-05 14:49:34 +01:00
Stefano Fiorucci
9da6696a45
chore: make openapi-llm an optional dependency (#8958)
* openapi-llm should be and optional dependency

* rm empty line
2025-03-05 11:15:19 +01:00
Stefano Fiorucci
10f11d40d4
build: support python 3.13 (#8965)
* support python 3.13

* release note

* add python version info to contributing guide

* better explanation
2025-03-05 09:49:10 +00:00
Stefano Fiorucci
f3c44be904
refactor!: remove dataframe field from Document and ExtractedTableAnswer; make pandas optional (#8906)
* remove dataframe

* release note

* small fix

* group imports

* Update pyproject.toml

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* Update pyproject.toml

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* address feedback

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-03-04 11:06:07 +00:00
Amna Mubashar
28db039bca
feat: add run_async to HuggingfaceAPIChatGenerator (#8943)
* add run_async

* add release notes

* Add integration test
2025-03-03 16:51:30 +01:00
Sebastian Husch Lee
99a998f90b
feat: Add MSGToDocument converter (#8868)
* Initial commit of MSG converter from Bijay

* Updates to the MSG converter

* Add license header

* Add tests for msg converter

* Update converter

* Expanding tests

* Update docstrings

* add license header

* Add reno

* Add to inits and pydocs

* Add test for empty input

* Fix types

* Fix mypy

---------

Co-authored-by: Bijay Gurung <bijay.learning@gmail.com>
2025-02-24 08:12:32 +01:00
Sebastian Husch Lee
a516672cfb
fix: Fix data dog tracing (#8900)
* Fix data dog tracing

* Add reno

* Update imports

* Fix
2025-02-21 14:35:04 +01:00
Stefano Fiorucci
04c6136cc4
relax posthog pin (#8898) 2025-02-21 10:49:29 +01:00
Stefano Fiorucci
fcca7104d3
pin ddtrace<3.0.0 (#8897) 2025-02-21 08:14:41 +00:00
Michele Pangrazzi
44fb20c2d5
Add run_async to OpenAIChatGenerator (#8880)
* Implememntation of run_async (wip)

* Add missing tests ; Move async tests to test_openai_async.py

* Add release note

* Update docstring

* Alignments with haystack-experimental implementation

* Lint: removed unused imports

* Update haystack/components/generators/chat/openai.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-02-20 16:51:46 +00:00
mathislucka
8c54f06a19
fix: component checks failing for components that return dataframes (#8873)
* fix: use is not to compare to sentinel value

* chore: release notes

* Update releasenotes/notes/fix-component-checks-with-ambiguous-truth-values-949c447b3702e427.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* fix: another sentinel value

* test: also test base class

* add pandas as test dependency

* format

* Trigger CI

* mark test with xfail strict=False

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-02-19 09:10:48 +00:00