280 Commits

Author SHA1 Message Date
MaChi
1fd4dfddcd
Merge branch 'main' into feature/chinese-document-splitter 2025-06-05 17:37:55 +08:00
Vladimir Blagojevic
b69d261280
chore: Make docstring-parser core dep (#9477)
* Make docstring-parser core dep

* Add reno note
2025-06-05 11:28:18 +02:00
mc112611
10ddc6edc0 Add test script for ChineseDocumentSplitter, remove Chinese comments, and fix lint issues 2025-06-05 16:08:21 +08:00
David S. Batista
7b2d038098 fixing lazy import 2025-06-04 18:00:57 +02:00
David S. Batista
32cd95c602 adding hanlp dependency 2025-06-04 17:45:26 +02:00
Stefano Fiorucci
d8487c4d8d
chore: make mypy run with --check-untyped-defs; fix some errors (#9447)
* chore: make mypy run with --check-untyped-defs; fix some errors

* small fixes

* use HfPipeline

* fix license error
2025-05-27 07:35:25 +00:00
Denis Washington
eefda0452d
chore: Make the Haystack core "type complete" (#9438)
* chore: Make the Haystack core "type complete"

For libraries with a `py.typed` marker, it is [recommended][1] to
make all public interfaces "type complete", i.e. to explicitly
annotate all function parameters and return types. Doing so has the
following benefits:

- It maximizes the type information available to users and IDEs.
- It ensures that the argument and return types are the intended ones.
- It sidesteps differences in type inference between the different
  type checker implementations.

This change makes a first step towards type completeness by enabling
the Mypy `disallow_incomplete_defs` for the core modules (excluding
`haystack.components.*` and `haystack.testing.*`) and fixing the
resulting errors.

[1]: https://typing.python.org/en/latest/guides/libraries.html#how-much-of-my-library-needs-types

* chore: Add `python_version = 3.9` to Mypy config

This catches type constructs that are only supported in later Python
versions.

* Remove unused import

* try to fix linting

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-05-26 11:00:22 +02:00
Stefano Fiorucci
17432f710d
feat: introduce SentenceTransformersSimilarityRanker (#9415)
* new component + tests

* soft deprecation of TransformersSimilarityRanker + reno

* add comp files to slow workflow

* Apply suggestions from code review

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>

* self.model -> self._cross_encoder

* recommend installing sentence-transformers>=4.1.0

---------

Co-authored-by: Sebastian Husch Lee <10526848+sjrl@users.noreply.github.com>
2025-05-21 10:52:46 +02:00
Stefano Fiorucci
9ae7da8df3
test: workflow for slow/unstable integration tests (#9267)
* workflow for slow integration tests

* try changing skipper

* Trigger Build

* better names

* fix

* mv tika to slow

* try skipping slow workflow

* retry paths-ignore

* remove skipper

* Revert "remove skipper"

This reverts commit 302ed2f07f36b33fa61fde0843b5590d79b98d74.

* better skipper

* retry

* Revert "retry"

This reverts commit fe5dff68f496645cc45292d74fcd8d043e868392.

* try using one workflow

* trigger

* try to see if it fails

* cosmetic changes

* improvements

* try matrix

* retry

* fix

* clean up

* simplify datadog monitoring and trigger

* send event to datadog for nightly failures

* tests should run if: manual trigger, scheduled, PR has label, release branch, or relevant files changed

* clarify slow marker

* improve comments

* labels
2025-04-23 10:36:44 +02:00
Stefano Fiorucci
c5a0bf9eaf
clean pyproject (#9214) 2025-04-11 15:57:56 +02:00
Stefano Fiorucci
8bf41a8510
test: create e2e environment; stop testing spacy in unit tests (#9212)
* ci: create e2e environment; stop testing spacy in unit tests

* try fix

* fix yml

* exclude test python files

* self-referential environment

* do not use self-referential environment
2025-04-11 10:28:53 +00:00
Sebastian Husch Lee
0d6a392506
chore: Bump transformers (#9178)
* Bump transformers

* Bump to patched version

* change version
2025-04-09 08:42:17 +02:00
Stefano Fiorucci
3a6e98565e
ci: pin blis for python 3.9 (#9158) 2025-04-02 15:32:11 +02:00
Julian Risch
f687d49fec
feat: Add option to split by number of tokens to RecursiveDocumentSplitter (#9143)
* add token split_unit

* fix overlap with fallback

* reno

* mark as integration tests

* use type ignore instead of assert

* Update releasenotes/notes/recursive-splitter-token-df56428887ac45bd.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-04-01 09:48:59 +02:00
Vladimir Blagojevic
13941d8bd9
feat: LinkContentFetcher - replace requests with httpx, add async and http/2 (#9034)
* LinkContentFetcher - replace requests with httpx, add async and http/2

* Update haystack/components/fetchers/link_content.py

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* Update haystack/components/fetchers/link_content.py

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* PR feedback

* Merge sync and async

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-03-26 14:55:08 +01:00
Stefano Fiorucci
c5cde40d3a
unpin ruff and update code (#9040) 2025-03-14 14:53:25 +00:00
Sebastian Husch Lee
3d7d65a260
Pin ruff (#9038) 2025-03-14 12:00:21 +01:00
Sebastian Husch Lee
4edefe3e56
Feat: Support Azure Workload Identity Credential (#9012)
* Start adding support for passing callable to Azure components

* Add to chat version

* Fix test

* Add reno

* Add support to azure doc and text embedder

* Rename

* update llm metadata extractor

* Add tests for text embedder

* Update tests

* Remove unused fixture and import

* Update reno
2025-03-12 13:45:40 +01:00
Stefano Fiorucci
c04c900f26
build: drop Python 3.8 support (#8978)
* draft

* readd typing_extensions

* small fix + release note

* remove ruff target-version

* Update releasenotes/notes/drop-python-3.8-868710963e794c83.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-03-05 14:59:56 +00:00
Stefano Fiorucci
ec97f4d991
update transformers test dependency to 4.48.3 (#8979) 2025-03-05 14:49:34 +01:00
Stefano Fiorucci
9da6696a45
chore: make openapi-llm an optional dependency (#8958)
* openapi-llm should be and optional dependency

* rm empty line
2025-03-05 11:15:19 +01:00
Stefano Fiorucci
10f11d40d4
build: support python 3.13 (#8965)
* support python 3.13

* release note

* add python version info to contributing guide

* better explanation
2025-03-05 09:49:10 +00:00
Stefano Fiorucci
f3c44be904
refactor!: remove dataframe field from Document and ExtractedTableAnswer; make pandas optional (#8906)
* remove dataframe

* release note

* small fix

* group imports

* Update pyproject.toml

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* Update pyproject.toml

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* address feedback

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-03-04 11:06:07 +00:00
Amna Mubashar
28db039bca
feat: add run_async to HuggingfaceAPIChatGenerator (#8943)
* add run_async

* add release notes

* Add integration test
2025-03-03 16:51:30 +01:00
Sebastian Husch Lee
99a998f90b
feat: Add MSGToDocument converter (#8868)
* Initial commit of MSG converter from Bijay

* Updates to the MSG converter

* Add license header

* Add tests for msg converter

* Update converter

* Expanding tests

* Update docstrings

* add license header

* Add reno

* Add to inits and pydocs

* Add test for empty input

* Fix types

* Fix mypy

---------

Co-authored-by: Bijay Gurung <bijay.learning@gmail.com>
2025-02-24 08:12:32 +01:00
Sebastian Husch Lee
a516672cfb
fix: Fix data dog tracing (#8900)
* Fix data dog tracing

* Add reno

* Update imports

* Fix
2025-02-21 14:35:04 +01:00
Stefano Fiorucci
04c6136cc4
relax posthog pin (#8898) 2025-02-21 10:49:29 +01:00
Stefano Fiorucci
fcca7104d3
pin ddtrace<3.0.0 (#8897) 2025-02-21 08:14:41 +00:00
Michele Pangrazzi
44fb20c2d5
Add run_async to OpenAIChatGenerator (#8880)
* Implememntation of run_async (wip)

* Add missing tests ; Move async tests to test_openai_async.py

* Add release note

* Update docstring

* Alignments with haystack-experimental implementation

* Lint: removed unused imports

* Update haystack/components/generators/chat/openai.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-02-20 16:51:46 +00:00
mathislucka
8c54f06a19
fix: component checks failing for components that return dataframes (#8873)
* fix: use is not to compare to sentinel value

* chore: release notes

* Update releasenotes/notes/fix-component-checks-with-ambiguous-truth-values-949c447b3702e427.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* fix: another sentinel value

* test: also test base class

* add pandas as test dependency

* format

* Trigger CI

* mark test with xfail strict=False

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-02-19 09:10:48 +00:00
Stefano Fiorucci
b5d2854b93
pin posthog<3.12.0 (#8841) 2025-02-11 10:30:57 +00:00
David S. Batista
f189a1c349
fix: LLMMetadataExtractor removing from_dict/to_dict AWS tests (#8840)
* removint from_dict/to_dict AWS tests

* removing boto3 import from tests
2025-02-11 09:40:58 +00:00
David S. Batista
f798a9e935
feat: adding LLMMetadataExtractor (#8833)
* fixing linting

* adding release notes

* updating tests

* adding to pydocs

* fixing typing due to Optional

* fixing docstring
2025-02-10 16:54:25 +00:00
Vladimir Blagojevic
fd5040108a
feat: Add OpenAPIConnector component, improve OpenAPI integration (#8808)
* Initial OpenAPIConnector

* Add reno note

* Format

* Add headers

* Add test dep

* Use haystack logger

* Fix test

* Minor fix, spin CI

* Update reno release note format

* Add to docs, pydocs improvements
2025-02-10 10:34:37 +01:00
mathislucka
eec91824bc
fix: pipeline run bugs in cyclic and acyclic pipelines (#8707)
* add component checks

* pipeline should run deterministically

* add FIFOQueue

* add agent tests

* add order dependent tests

* run new tests

* remove code that is not needed

* test: intermediate from cycle outputs are available outside cycle

* add tests for component checks (Claude)

* adapt tests for component checks (o1 review)

* chore: format

* remove tests that aren't needed anymore

* add _calculate_priority tests

* revert accidental change in pyproject.toml

* test format conversion

* adapt to naming convention

* chore: proper docstrings and type hints for PQ

* format

* add more unit tests

* rm unneeded comments

* test input consumption

* lint

* fix: docstrings

* lint

* format

* format

* fix license header

* fix license header

* add component run tests

* fix: pass correct input format to tracing

* fix types

* format

* format

* types

* add defaults from Socket instead of signature

- otherwise components with dynamic inputs would fail

* fix test names

* still wait for optional inputs on greedy variadic sockets

- mirrors previous behavior

* fix format

* wip: warn for ambiguous running order

* wip: alternative warning

* fix license header

* make code more readable

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Introduce content tracing to a behavioral test

* Fixing linting

* Remove debug print statements

* Fix tracer tests

* remove print

* test: test for component inputs

* test: remove testing for run order

* chore: update component checks from experimental

* chore: update pipeline and base from experimental

* refactor: remove unused method

* refactor: remove unused method

* refactor: outdated comment

* refactor: inputs state is updated as side effect

- to prepare for AsyncPipeline implementation

* format

* test: add file conversion test

* format

* fix: original implementation deepcopies outputs

* lint

* fix: from_dict was updated

* fix: format

* fix: test

* test: add test for thread safety

* remove unused imports

* format

* test: FIFOPriorityQueue

* chore: add release note

* fix: resolve merge conflict with mermaid changes

* fix: format

* fix: remove unused import

* refactor: rename to avoid accidental conflicts

* chore: remove unused inputs, add missing license header

* chore: extend release notes

* Update releasenotes/notes/fix-pipeline-run-2fefeafc705a6d91.yaml

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* fix: format

* fix: format

* Update release note

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 14:19:47 +00:00
Stefano Fiorucci
877f826da0
refactor: HF API Embedders - use InferenceClient.feature_extraction instead of InferenceClient.post (#8794)
* HF API Embedders: refactoring

* rename variables

* rm leftovers

* rm pin

* rm unused import

* relnote

* warning with truncate/normalize and serverless inference API

* test that warnings are raised
2025-02-03 15:11:16 +00:00
Amna Mubashar
379711f63e
fix: Pin nltk version for sentence tokenizer (#8786)
* Pin nltk version for sentence tokenizer

* Update pyproject.toml

* Update haystack/components/preprocessors/sentence_tokenizer.py

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-01-31 17:01:00 +01:00
Stefano Fiorucci
3ef609a3e8
temporarily pin huggingface_hub<0.28.0 (#8790) 2025-01-31 10:35:15 +01:00
Stefano Fiorucci
0ac47b0064
pin numba>=0.54.0 (#8773) 2025-01-27 11:55:18 +01:00
Stefano Fiorucci
f96839e139
chore: update transformers test dependency (#8752)
* update transformers test dependency

* add pad_token_id to the mock tokenizer

* fix HFLocal test + new test
2025-01-21 14:43:27 +01:00
Stefano Fiorucci
2bf6bf6a45
build: add jsonschema library to core dependencies (#8753)
* add jsonschema to core dependencies

* release note
2025-01-21 10:07:56 +01:00
Vladimir Blagojevic
d147c7658f
feat: Add ComponentTool to Haystack tools (#8693)
* Initial ComponentTool
---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-01-13 11:15:33 +01:00
Sebastian Husch Lee
28ad78c73d
feat: Add XLSXToDocument converter (#8522)
* Add draft of the Excel To Document converter

* Add license header

* Add release note

* Use Union instead of pipe

* Add openpyxl as additional dep

* Fix zip issue

* few updates from Bijay

* Update deps

* Add markdown test

* Adding more example excels and expanding tests

* Added more tests

* Fix windows test by setting lineterminator

* Addressing PR comments

* PR comments

* Fix linting
2025-01-09 09:03:19 +01:00
Stefano Fiorucci
2bc58d2987
feat: support for tools in HuggingFaceAPIChatGenerator (#8661)
* message conversion function

* hfapi w tools

* right test file + hf_hub version

* release note

* feedback
2024-12-19 15:04:37 +01:00
Stefano Fiorucci
96b4a1d2fd
feat: Tool dataclass - unified abstraction to represent tools (#8652)
* draft

* del HF token in tests

* adaptations

* progress

* fix type

* import sorting

* more control on deserialization

* release note

* improvements

* support name field

* fix chatpromptbuilder test

* port Tool from experimental

* release note

* docs upd

* Update tool.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-18 11:36:44 +00:00
Stefano Fiorucci
2a9a6401d2
chore: pin openai>=1.56.1 (#8632)
* pin openai>=1.56.1

* release note
2024-12-12 16:26:38 +01:00
David S. Batista
248dccbdd3
chore: fixing pylint issues (#8610)
* initial import

* fixing internal methods

* fixing some internal methods

* modify _preprocess

* fixed internal methods

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-12-09 16:53:37 +00:00
Stefano Fiorucci
de7099e560
ci: add job to check imports (#8594)
* try checking imports

* clarify error message

* better fmt

* do not show complete list of successfully imported packages

* refinements

* relnote

* add missing forward references

* better function name

* linting

* fix linting

* Update .github/utils/check_imports.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-11-29 14:00:59 +00:00
Stefano Fiorucci
f085959067
chore: declare requires-python<3.13 in pyproject (#8547)
* restrict to python<3.13

* try unpinning dulwich

* reintroduce dulwich pin
2024-11-15 09:28:39 +00:00
Silvano Cerza
ebb45d3d1e
Remove ddtrace version pin (#8529) 2024-11-11 11:21:10 +01:00