267 Commits

Author SHA1 Message Date
Julian Risch
f687d49fec
feat: Add option to split by number of tokens to RecursiveDocumentSplitter (#9143)
* add token split_unit

* fix overlap with fallback

* reno

* mark as integration tests

* use type ignore instead of assert

* Update releasenotes/notes/recursive-splitter-token-df56428887ac45bd.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-04-01 09:48:59 +02:00
Vladimir Blagojevic
13941d8bd9
feat: LinkContentFetcher - replace requests with httpx, add async and http/2 (#9034)
* LinkContentFetcher - replace requests with httpx, add async and http/2

* Update haystack/components/fetchers/link_content.py

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* Update haystack/components/fetchers/link_content.py

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* PR feedback

* Merge sync and async

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-03-26 14:55:08 +01:00
Stefano Fiorucci
c5cde40d3a
unpin ruff and update code (#9040) 2025-03-14 14:53:25 +00:00
Sebastian Husch Lee
3d7d65a260
Pin ruff (#9038) 2025-03-14 12:00:21 +01:00
Sebastian Husch Lee
4edefe3e56
Feat: Support Azure Workload Identity Credential (#9012)
* Start adding support for passing callable to Azure components

* Add to chat version

* Fix test

* Add reno

* Add support to azure doc and text embedder

* Rename

* update llm metadata extractor

* Add tests for text embedder

* Update tests

* Remove unused fixture and import

* Update reno
2025-03-12 13:45:40 +01:00
Stefano Fiorucci
c04c900f26
build: drop Python 3.8 support (#8978)
* draft

* readd typing_extensions

* small fix + release note

* remove ruff target-version

* Update releasenotes/notes/drop-python-3.8-868710963e794c83.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-03-05 14:59:56 +00:00
Stefano Fiorucci
ec97f4d991
update transformers test dependency to 4.48.3 (#8979) 2025-03-05 14:49:34 +01:00
Stefano Fiorucci
9da6696a45
chore: make openapi-llm an optional dependency (#8958)
* openapi-llm should be and optional dependency

* rm empty line
2025-03-05 11:15:19 +01:00
Stefano Fiorucci
10f11d40d4
build: support python 3.13 (#8965)
* support python 3.13

* release note

* add python version info to contributing guide

* better explanation
2025-03-05 09:49:10 +00:00
Stefano Fiorucci
f3c44be904
refactor!: remove dataframe field from Document and ExtractedTableAnswer; make pandas optional (#8906)
* remove dataframe

* release note

* small fix

* group imports

* Update pyproject.toml

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* Update pyproject.toml

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* address feedback

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-03-04 11:06:07 +00:00
Amna Mubashar
28db039bca
feat: add run_async to HuggingfaceAPIChatGenerator (#8943)
* add run_async

* add release notes

* Add integration test
2025-03-03 16:51:30 +01:00
Sebastian Husch Lee
99a998f90b
feat: Add MSGToDocument converter (#8868)
* Initial commit of MSG converter from Bijay

* Updates to the MSG converter

* Add license header

* Add tests for msg converter

* Update converter

* Expanding tests

* Update docstrings

* add license header

* Add reno

* Add to inits and pydocs

* Add test for empty input

* Fix types

* Fix mypy

---------

Co-authored-by: Bijay Gurung <bijay.learning@gmail.com>
2025-02-24 08:12:32 +01:00
Sebastian Husch Lee
a516672cfb
fix: Fix data dog tracing (#8900)
* Fix data dog tracing

* Add reno

* Update imports

* Fix
2025-02-21 14:35:04 +01:00
Stefano Fiorucci
04c6136cc4
relax posthog pin (#8898) 2025-02-21 10:49:29 +01:00
Stefano Fiorucci
fcca7104d3
pin ddtrace<3.0.0 (#8897) 2025-02-21 08:14:41 +00:00
Michele Pangrazzi
44fb20c2d5
Add run_async to OpenAIChatGenerator (#8880)
* Implememntation of run_async (wip)

* Add missing tests ; Move async tests to test_openai_async.py

* Add release note

* Update docstring

* Alignments with haystack-experimental implementation

* Lint: removed unused imports

* Update haystack/components/generators/chat/openai.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-02-20 16:51:46 +00:00
mathislucka
8c54f06a19
fix: component checks failing for components that return dataframes (#8873)
* fix: use is not to compare to sentinel value

* chore: release notes

* Update releasenotes/notes/fix-component-checks-with-ambiguous-truth-values-949c447b3702e427.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* fix: another sentinel value

* test: also test base class

* add pandas as test dependency

* format

* Trigger CI

* mark test with xfail strict=False

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-02-19 09:10:48 +00:00
Stefano Fiorucci
b5d2854b93
pin posthog<3.12.0 (#8841) 2025-02-11 10:30:57 +00:00
David S. Batista
f189a1c349
fix: LLMMetadataExtractor removing from_dict/to_dict AWS tests (#8840)
* removint from_dict/to_dict AWS tests

* removing boto3 import from tests
2025-02-11 09:40:58 +00:00
David S. Batista
f798a9e935
feat: adding LLMMetadataExtractor (#8833)
* fixing linting

* adding release notes

* updating tests

* adding to pydocs

* fixing typing due to Optional

* fixing docstring
2025-02-10 16:54:25 +00:00
Vladimir Blagojevic
fd5040108a
feat: Add OpenAPIConnector component, improve OpenAPI integration (#8808)
* Initial OpenAPIConnector

* Add reno note

* Format

* Add headers

* Add test dep

* Use haystack logger

* Fix test

* Minor fix, spin CI

* Update reno release note format

* Add to docs, pydocs improvements
2025-02-10 10:34:37 +01:00
mathislucka
eec91824bc
fix: pipeline run bugs in cyclic and acyclic pipelines (#8707)
* add component checks

* pipeline should run deterministically

* add FIFOQueue

* add agent tests

* add order dependent tests

* run new tests

* remove code that is not needed

* test: intermediate from cycle outputs are available outside cycle

* add tests for component checks (Claude)

* adapt tests for component checks (o1 review)

* chore: format

* remove tests that aren't needed anymore

* add _calculate_priority tests

* revert accidental change in pyproject.toml

* test format conversion

* adapt to naming convention

* chore: proper docstrings and type hints for PQ

* format

* add more unit tests

* rm unneeded comments

* test input consumption

* lint

* fix: docstrings

* lint

* format

* format

* fix license header

* fix license header

* add component run tests

* fix: pass correct input format to tracing

* fix types

* format

* format

* types

* add defaults from Socket instead of signature

- otherwise components with dynamic inputs would fail

* fix test names

* still wait for optional inputs on greedy variadic sockets

- mirrors previous behavior

* fix format

* wip: warn for ambiguous running order

* wip: alternative warning

* fix license header

* make code more readable

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Introduce content tracing to a behavioral test

* Fixing linting

* Remove debug print statements

* Fix tracer tests

* remove print

* test: test for component inputs

* test: remove testing for run order

* chore: update component checks from experimental

* chore: update pipeline and base from experimental

* refactor: remove unused method

* refactor: remove unused method

* refactor: outdated comment

* refactor: inputs state is updated as side effect

- to prepare for AsyncPipeline implementation

* format

* test: add file conversion test

* format

* fix: original implementation deepcopies outputs

* lint

* fix: from_dict was updated

* fix: format

* fix: test

* test: add test for thread safety

* remove unused imports

* format

* test: FIFOPriorityQueue

* chore: add release note

* fix: resolve merge conflict with mermaid changes

* fix: format

* fix: remove unused import

* refactor: rename to avoid accidental conflicts

* chore: remove unused inputs, add missing license header

* chore: extend release notes

* Update releasenotes/notes/fix-pipeline-run-2fefeafc705a6d91.yaml

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* fix: format

* fix: format

* Update release note

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 14:19:47 +00:00
Stefano Fiorucci
877f826da0
refactor: HF API Embedders - use InferenceClient.feature_extraction instead of InferenceClient.post (#8794)
* HF API Embedders: refactoring

* rename variables

* rm leftovers

* rm pin

* rm unused import

* relnote

* warning with truncate/normalize and serverless inference API

* test that warnings are raised
2025-02-03 15:11:16 +00:00
Amna Mubashar
379711f63e
fix: Pin nltk version for sentence tokenizer (#8786)
* Pin nltk version for sentence tokenizer

* Update pyproject.toml

* Update haystack/components/preprocessors/sentence_tokenizer.py

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-01-31 17:01:00 +01:00
Stefano Fiorucci
3ef609a3e8
temporarily pin huggingface_hub<0.28.0 (#8790) 2025-01-31 10:35:15 +01:00
Stefano Fiorucci
0ac47b0064
pin numba>=0.54.0 (#8773) 2025-01-27 11:55:18 +01:00
Stefano Fiorucci
f96839e139
chore: update transformers test dependency (#8752)
* update transformers test dependency

* add pad_token_id to the mock tokenizer

* fix HFLocal test + new test
2025-01-21 14:43:27 +01:00
Stefano Fiorucci
2bf6bf6a45
build: add jsonschema library to core dependencies (#8753)
* add jsonschema to core dependencies

* release note
2025-01-21 10:07:56 +01:00
Vladimir Blagojevic
d147c7658f
feat: Add ComponentTool to Haystack tools (#8693)
* Initial ComponentTool
---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-01-13 11:15:33 +01:00
Sebastian Husch Lee
28ad78c73d
feat: Add XLSXToDocument converter (#8522)
* Add draft of the Excel To Document converter

* Add license header

* Add release note

* Use Union instead of pipe

* Add openpyxl as additional dep

* Fix zip issue

* few updates from Bijay

* Update deps

* Add markdown test

* Adding more example excels and expanding tests

* Added more tests

* Fix windows test by setting lineterminator

* Addressing PR comments

* PR comments

* Fix linting
2025-01-09 09:03:19 +01:00
Stefano Fiorucci
2bc58d2987
feat: support for tools in HuggingFaceAPIChatGenerator (#8661)
* message conversion function

* hfapi w tools

* right test file + hf_hub version

* release note

* feedback
2024-12-19 15:04:37 +01:00
Stefano Fiorucci
96b4a1d2fd
feat: Tool dataclass - unified abstraction to represent tools (#8652)
* draft

* del HF token in tests

* adaptations

* progress

* fix type

* import sorting

* more control on deserialization

* release note

* improvements

* support name field

* fix chatpromptbuilder test

* port Tool from experimental

* release note

* docs upd

* Update tool.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-18 11:36:44 +00:00
Stefano Fiorucci
2a9a6401d2
chore: pin openai>=1.56.1 (#8632)
* pin openai>=1.56.1

* release note
2024-12-12 16:26:38 +01:00
David S. Batista
248dccbdd3
chore: fixing pylint issues (#8610)
* initial import

* fixing internal methods

* fixing some internal methods

* modify _preprocess

* fixed internal methods

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-12-09 16:53:37 +00:00
Stefano Fiorucci
de7099e560
ci: add job to check imports (#8594)
* try checking imports

* clarify error message

* better fmt

* do not show complete list of successfully imported packages

* refinements

* relnote

* add missing forward references

* better function name

* linting

* fix linting

* Update .github/utils/check_imports.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-11-29 14:00:59 +00:00
Stefano Fiorucci
f085959067
chore: declare requires-python<3.13 in pyproject (#8547)
* restrict to python<3.13

* try unpinning dulwich

* reintroduce dulwich pin
2024-11-15 09:28:39 +00:00
Silvano Cerza
ebb45d3d1e
Remove ddtrace version pin (#8529) 2024-11-11 11:21:10 +01:00
Stefano Fiorucci
c7b898994e
build: unpin numpy + use Python 3.9 in CI (#8492)
* try unpinning numpy

* try python 3.9

* release note
2024-10-28 12:15:17 +01:00
Silvano Cerza
0157459a7b
Pin ddtrace test dependency to fix tests (#8478) 2024-10-22 10:19:25 +00:00
Stefano Fiorucci
f6935d1456
ci: add pip to test dependencies (#8475)
* add pip to test dependencies

* trigger

* release note

* rm trigger
2024-10-22 08:35:30 +00:00
Stefano Fiorucci
7788bfe558
ci: upgrade Hatch to 1.13.0 and adopt uv as installer (#8313)
* try uv

* upgrade hatch

* rm unnecessary specification

* release note
2024-10-17 10:32:14 +02:00
Silvano Cerza
29672d4b42
feat: Add JSONConverter Component (#8397)
* Add JSONConverter Component

* Handle some corner cases

* Add JSONConverter to pydoc config

* Add a way to extract all non content fields as metadata

* Small fix in docstring

* Fix tests

* docstrings upd

* Update json.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-09-25 12:34:51 +02:00
Silvano Cerza
4b77ec1b6f
Fix codespell config (#8392) 2024-09-24 12:00:45 +02:00
Vladimir Blagojevic
badd0594cc
feat: Port NLTKDocumentSplitter from dC to Haystack (#8350)
* Port NLTKDocumentSplitter from dC to Haystack

* Improve pydocs

* Use haystack logging

* Add NLTKDocumentSplitter to __init__.py

* Use haystack logging, rename test classes

* Fixing _needs_join return

* Linting

* PR feedback

* More static methods

* Increase test coverage

* Compile pattern

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-09-17 13:59:19 +02:00
Silvano Cerza
da49e782e2
chore: Make arrow an optional dependency (#8345)
* Make arrow an optional dependency

* Fix imports
2024-09-09 16:09:51 +02:00
Mo Sriha
75955922b9
feat: Add current date in UTC to PromptBuilder (#8233)
* initial commit

* add unit tests

* add release notes

* update function name
2024-09-09 09:47:03 +02:00
Stefano Fiorucci
25d333bed3
update transformers (#8296) 2024-08-27 16:04:11 +00:00
Stefano Fiorucci
6b0ee4c193
chore: update test dependency and LazyImport block to make compatibility with sentence-transformers>=3.0.0 explicit (#8295)
* sentence-transformers-3 update test dep and lazyimport block

* clearer release note
2024-08-27 15:51:03 +00:00
Tobias Wochinger
5a3ea75196
docs: document Python 3.11 and 3.12 support (#8159)
* docs: add Python 3.11 and 3.12 to supported versions

* docs: add release notes
2024-08-02 14:46:20 +02:00
Tobias Wochinger
4dde6fbaec
build: unpin structlog (#8071) 2024-07-24 20:58:34 +02:00