413 Commits

Author SHA1 Message Date
David S. Batista
c037052581
feat: adding function to detect unmapped CID characters in PDFMinerToDocument (#8992)
* adding function to detect unmapped CID characters

* adding release notes

* adding test for logs
2025-03-06 15:44:06 +00:00
David S. Batista
4c9d08add5
feat: async support for the HuggingFaceLocalChatGenerator (#8981)
* adding async run method

* passing an optional ThreadExecutor

* adding tests

* adding release notes

* nit: license

* fixing linting

* Update releasenotes/notes/adding-async-huggingface-local-chat-generator-962512f52282d12d.yaml

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Use Phi isntead (#8982)

* build: drop Python 3.8 support (#8978)

* draft

* readd typing_extensions

* small fix + release note

* remove ruff target-version

* Update releasenotes/notes/drop-python-3.8-868710963e794c83.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* Update unstable version to 2.12.0-rc0 (#8983)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

* fix: allow support for `include_usage` in streaming using OpenAIChatGenerator (#8968)

* fix error in handling usage completion chunk

* ci: improve release notes format checking (#8984)

* chore: fix invalid release note

* try improving relnote linting

* add relnotes path

* fix bad release note

* improve reno config

* fix: handle async tests in`HuggingFaceAPIChatGenerator` to prevent error (#8986)

* add missing asyncio

* explicitly close connection in the test

* Fix tests (#8990)

* docs: Update docstrings of `BranchJoiner` (#8988)

* Update docstrings

* Add a bit more explanatory text

* Add reno

* Update haystack/components/joiners/branch.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/joiners/branch.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/joiners/branch.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/joiners/branch.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Fix formatting

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* PR comments

* destroying ThreadPoolExecutor when the generator instance is being destroyied, only if it was not passed externally

* fixing bug in streaming_callback

* PR comments

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Haystack Bot <73523382+HaystackBot@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2025-03-06 15:57:11 +01:00
Amna Mubashar
ae26e7580b
fix: handle async tests inHuggingFaceAPIChatGenerator to prevent error (#8986)
* add missing asyncio

* explicitly close connection in the test
2025-03-06 10:55:01 +01:00
Amna Mubashar
13c3768d49
fix: allow support for include_usage in streaming using OpenAIChatGenerator (#8968)
* fix error in handling usage completion chunk
2025-03-05 18:30:26 +01:00
Stefano Fiorucci
c04c900f26
build: drop Python 3.8 support (#8978)
* draft

* readd typing_extensions

* small fix + release note

* remove ruff target-version

* Update releasenotes/notes/drop-python-3.8-868710963e794c83.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-03-05 14:59:56 +00:00
Sebastian Husch Lee
4a87ceb0ed
Use Phi isntead (#8982) 2025-03-05 15:53:26 +01:00
Sebastian Husch Lee
f741df88df
fix: Update flaky HugginFace Generator tests to use more reliable model and add instruction tokens (#8980)
* Fix test

* Make other HF tests more reliable

* Add back test
2025-03-05 15:26:17 +01:00
Julian Risch
b77f2bad79
feat: Add async run to DocumentWriter (#8962)
* add async run to DocumentWriter

* reno
2025-03-05 11:53:35 +01:00
Stefano Fiorucci
f3c44be904
refactor!: remove dataframe field from Document and ExtractedTableAnswer; make pandas optional (#8906)
* remove dataframe

* release note

* small fix

* group imports

* Update pyproject.toml

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* Update pyproject.toml

Co-authored-by: Julian Risch <julian.risch@deepset.ai>

* address feedback

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2025-03-04 11:06:07 +00:00
Mohammed Abdul Razak Wahab
0d65b4caa7
feat: Enhance error handling in Azure document embedder (#8941)
* feat: Enhance error handling in Azure document embedder

* add release notes

* address review comments

* Update releasenotes/notes/add-azure-embedder-exception-handler-c10ea46fb536de3b.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* more alignment with OpenAI impl

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-03-04 11:16:08 +01:00
Amna Mubashar
28db039bca
feat: add run_async to HuggingfaceAPIChatGenerator (#8943)
* add run_async

* add release notes

* Add integration test
2025-03-03 16:51:30 +01:00
tstadel
13968cc15b
fix: in OpenAIChatGenerator set additionalProperties to False when tools_strict=True (#8913)
* fix: set ComponentTool addtionalProperties for OpenAI tools_strict=True

* add reno

* Move the additionalProperties into the OpenAIChatGenerator

* Remove

* Put additionalProperties into the correct place

* Fix test

* Update releasenotes/notes/fix-componenttool-for-openai-tools_strict-998e5cd7ebc6ec19.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Sebastian Husch Lee <sebastian.lee@deepset.ai>
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-03-03 16:23:24 +01:00
Sebastian Husch Lee
296e31c182
feat: Add Type Validation parameter for Pipeline Connections (#8875)
* Starting to refactor type util tests to be more systematic

* refactoring

* Expand tests

* Update to type utils

* Add missing subclass check

* Expand and refactor tests, introduce type_validation Literal

* More test refactoring

* Test refactoring, adding type validation variable to pipeline base

* Update relaxed version of type checking to pass all newly added tests

* trim whitespace

* Add tests

* cleanup

* Updates docstrings

* Add reno

* docs

* Fix mypy and add docstrings

* Changes based on advice from Tobi

* Remove unused imports

* Doc strings

* Add connection type validation to to_dict and from_dict

* Update tests

* Fix test

* Also save connection_type_validation at global pipeline level

* Fix tests

* Remove connection type validation from the connect level, only keep at pipeline level

* Formatting

* Fix tests

* formatting
2025-03-03 16:00:22 +01:00
Sebastian Husch Lee
00fe4d157d
feat: Add run async for AzureOpenAIChatGenerator (#8948)
* Add tests for run_async

* Add reno

* Add async client

* Add init test

* Add comment

* Fix test

* Update releasenotes/notes/run-async-azure-54450f0c2495f5c8.yaml

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
2025-03-03 14:17:18 +00:00
Sebastian Husch Lee
52a028251c
refactor!: update AzureOCRDocumentConverter to not use the dataframe field for tabular Documents (#8885)
* Save document as a csv table now

* Fix tests

* Fix tests

* Add reno
2025-03-03 12:45:02 +00:00
Michele Pangrazzi
209e6d5ff0
remove duplicate test (#8944) 2025-02-28 13:27:43 +00:00
Michele Pangrazzi
db4f23771a
Avoid mutating self.routes in ConditionalRouter to_dict method (#8936)
* Avoid mutating self.routes in ConditionalRouter to_dict method

* Add release note

* Update releasenotes/notes/fix-conditional-router-to-dict-5af887da50effe11.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* Make test_router_to_dict_does_not_mutate_routes more robut (add another roundtrip)

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-26 12:34:35 +01:00
Michele Pangrazzi
d1e503e5c7
skip HF API integration test (#8938) 2025-02-26 12:10:54 +01:00
Julian Risch
6652dd7550
Revert "test: skip HF API live integration tests (#8889)" (#8914)
* Revert "test: skip HF API live integration tests (#8889)"

This reverts commit 56a3a9bd61b7391ae91e3d8179b3b33918ef4932.

* Replace zephyr-7b-beta model with SmolLM2-1.7B-Instruct

* Use zephyr-7b-beta model but extend instructions

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-25 09:03:20 +01:00
Sebastian Husch Lee
af3c89a257
feat: In FileTypeRouter add .msg to "application/vnd.ms-outlook" mapping (#8910)
* Add .msg mimetype support in file type router

* Add reno

* Update tests
2025-02-24 09:10:17 +01:00
Sebastian Husch Lee
99a998f90b
feat: Add MSGToDocument converter (#8868)
* Initial commit of MSG converter from Bijay

* Updates to the MSG converter

* Add license header

* Add tests for msg converter

* Update converter

* Expanding tests

* Update docstrings

* add license header

* Add reno

* Add to inits and pydocs

* Add test for empty input

* Fix types

* Fix mypy

---------

Co-authored-by: Bijay Gurung <bijay.learning@gmail.com>
2025-02-24 08:12:32 +01:00
David S. Batista
7d51793727
chore: cleaning up unused imports in tests (#8887) 2025-02-20 16:56:16 +00:00
Michele Pangrazzi
44fb20c2d5
Add run_async to OpenAIChatGenerator (#8880)
* Implememntation of run_async (wip)

* Add missing tests ; Move async tests to test_openai_async.py

* Add release note

* Update docstring

* Alignments with haystack-experimental implementation

* Lint: removed unused imports

* Update haystack/components/generators/chat/openai.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2025-02-20 16:51:46 +00:00
Stefano Fiorucci
56a3a9bd61
test: skip HF API live integration tests (#8889)
* skip HF API integration tests

* better wording
2025-02-20 16:38:57 +00:00
Sebastian Husch Lee
62d0d5d3d5
Update default output type of list joiner to be correct (#8881) 2025-02-20 10:54:50 +01:00
Sebastian Husch Lee
8cafcddb00
chore: Remove print statements from tests and mention of old name (#8883)
* Remove print statements from tests

* Remove mention of Canals

* Remove another mention
2025-02-20 10:24:26 +01:00
Sebastian Husch Lee
52909a0c81
fix: Fix OpenAIChatGenerator + tools + streaming (#8879)
* Fix chat generator + tools + streaming

* Add reno

* Update docs

* Remove unused import

* add doc

* Fix test

* small cleanup

* PR comments

* fix test

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-02-20 08:40:22 +01:00
mathislucka
8c54f06a19
fix: component checks failing for components that return dataframes (#8873)
* fix: use is not to compare to sentinel value

* chore: release notes

* Update releasenotes/notes/fix-component-checks-with-ambiguous-truth-values-949c447b3702e427.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* fix: another sentinel value

* test: also test base class

* add pandas as test dependency

* format

* Trigger CI

* mark test with xfail strict=False

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2025-02-19 09:10:48 +00:00
Sebastian Husch Lee
0c62087dd7
Make openai test more robust (#8872) 2025-02-18 11:38:16 +01:00
Sebastian Husch Lee
2f383bce25
feat: Update list joiner (#8851)
* Update ListJoiner to have default type List

* Add reno

* Add more tests

* Remove unused import

* Fix mypy

* Update docstrings

* Update haystack/components/joiners/list_joiner.py

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

---------

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
2025-02-14 09:47:19 +01:00
Ulises M
bfdad40a80
feat: Add ONNX & OpenVINO backend support, and torch dtype kwargs in Sentence Transformers Components (#8813)
* initial rough draft

* expose backend instead of extracting from model_kwargs

* explictly set backend model path

* add reno

* expose backend for ST diversity backend

* add dtype tests and expose kwargs to ST ranker for backend parameters

* skip dtype tests as torch isnt compiled with cuda

* add new openvino dependency release, unskip tests

* resolve suggestion

* mock calls, turn integrations into unit tests

* remove unnecessary test dependencies
2025-02-13 12:04:14 +01:00
Sebastian Husch Lee
71416c81bc
feat: Add store_full_path to converter (#8849)
* Add missing store_full_path to converter

* Add release note

* Fix pylint
2025-02-12 17:11:59 +01:00
Vladimir Blagojevic
a7c1661f13
fix: Look through all streaming chunks for tools calls (#8829)
* Look through all streaming chunks for tools calls

* Add reno note

* mypy fixes

* Improve robustness

* Don't concatenate, use the last value

* typing

* Update releasenotes/notes/improve-tool-call-chunk-search-986474e814af17a7.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* Small refactoring

* isort

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-11 13:25:39 +01:00
Sebastian Husch Lee
2c0a72844f
Fix splitter when table is only one row wide (#8839) 2025-02-11 09:55:35 +00:00
David S. Batista
f189a1c349
fix: LLMMetadataExtractor removing from_dict/to_dict AWS tests (#8840)
* removint from_dict/to_dict AWS tests

* removing boto3 import from tests
2025-02-11 09:40:58 +00:00
Sebastian Husch Lee
f9e6e481a1
feat: Add new component CSVDocumentSplitter to recursively split CSV documents (#8815)
* CSV Document Splitter

* Add license header

* Add newline

* Add to docs

* Add lineterminator

* Updated csv splitter to allow user to specify to split by row, column or both

* Adding more tests

* Column tests

* Some refactoring to remove incorrect dropna call

* Fix

* More complicated test

* Adding more relevant metadata to match whats provided in our other splitters

* value error tests

* Fix mypy

* Docstring updates

* Add skip_blank_lines=False

* Add to dict test

* More from and to dict tests

* Fixes

* Move dict creation outside of for loop
2025-02-10 18:10:18 +01:00
David S. Batista
f798a9e935
feat: adding LLMMetadataExtractor (#8833)
* fixing linting

* adding release notes

* updating tests

* adding to pydocs

* fixing typing due to Optional

* fixing docstring
2025-02-10 16:54:25 +00:00
Vladimir Blagojevic
b6ebd3cd77
fix: Update OpenAPIServiceConnector to new ChatMessage (#8817)
* Update OpenAPIServiceConnector to new ChatMessage, bypass model response validation

* Add reno

* Lint fixes

* Add serde pipeline test

* Update haystack/components/connectors/openapi_service.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/connectors/openapi_service.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/connectors/openapi_service.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/connectors/openapi_service.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/connectors/openapi_service.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Add edge case unit test

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2025-02-10 17:36:53 +01:00
Vladimir Blagojevic
fd5040108a
feat: Add OpenAPIConnector component, improve OpenAPI integration (#8808)
* Initial OpenAPIConnector

* Add reno note

* Format

* Add headers

* Add test dep

* Use haystack logger

* Fix test

* Minor fix, spin CI

* Update reno release note format

* Add to docs, pydocs improvements
2025-02-10 10:34:37 +01:00
Vladimir Blagojevic
73bfc08b71
feat: HuggingFaceLocalChatGenerator unified support for tools (#8827)
* Add tools to HuggingFaceLocalChatGenerator

* Add reno

* Fix types

* Small post merge fix

* Add unit tests

* Add tools serde and tests

* PR feedback

* PR feedback
2025-02-10 09:44:51 +01:00
Sebastian Husch Lee
35788a2d06
feat: Update csv cleaner (#8828)
* More refactoring

* Add more new options and more tests

* Improve docstrings

* Add release notes

* Fix pylint
2025-02-07 14:29:53 +01:00
Sebastian Husch Lee
1785ea622e
feat: Add component CSVDocumentCleaner for removing empty rows and columns (#8816)
* Initial commit for csv cleaner

* Add release notes

* Update lineterminator

* Update releasenotes/notes/csv-document-cleaner-8eca67e884684c56.yaml

Co-authored-by: David S. Batista <dsbatista@gmail.com>

* alphabetize

* Use lazy import

* Some refactoring

* Some refactoring

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2025-02-06 17:56:38 +01:00
Stefano Fiorucci
1f257944a6
chore: fix Hugging Face components for mypy 1.15.0 (#8822)
* chore: fix Hugging Face components for mypy 1.15.0

* small fixes

* fix test

* rm print

* use cast and be more permissive
2025-02-06 16:25:59 +00:00
Amna Mubashar
b0809b75f5
feat: Add a ListJoiner component (#8810)
* Add a ListJoiner

* Add tests and release notes
2025-02-05 23:19:14 +01:00
György Orosz
d2348ad462
feat: SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder can accept and pass any arguments to SentenceTransformer.encode (#8806)
* feat: SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder can accept and pass any arguments to SentenceTransformer.encode

* refactor: encode_kwargs parameter of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder mae to be the last positional parameter for backward compatibility reasons

* docs: added explanation for encode_kwargs in SentenceTransformersTextEmbedder and SentenceTransformersDocumentEmbedder

* test: added tests for encode_kwargs in SentenceTransformersTextEmbedder and SentenceTransformersDocumentEmbedder

* doc: removed empty lines from docstrings of SentenceTransformersTextEmbedder and SentenceTransformersDocumentEmbedder

* refactor: encode_kwargs parameter of SentenceTransformersDocumentEmbedder and SentenceTransformersTextEmbedder mae to be the last positional parameter for backward compatibility (part II.)
2025-02-05 16:09:35 +00:00
Stefano Fiorucci
2828d9e4ae
refactor!: DOCXToDocument converter - store DOCX metadata as a dict (#8804)
* DOCXToDocument - store DOCX metadata as a dict

* do not export DOCXMetadata to converters package
2025-02-05 14:43:19 +01:00
Stefano Fiorucci
5ae94886b2
fix: fix test failures with Transformers models in PRs from forks (#8809)
* trigger

* try pinning sentence transformers

* make integr tests run right away

* pin transformers instead

* older transformers version

* rm transformers pin

* try ignoring cache

* change ubuntu version

* try removing token

* try again

* more HF_API_TOKEN local deletions

* restore test priority

* rm leftover

* more deletions

* moreee

* more

* deletions

* restore jobs order
2025-02-04 19:08:37 +01:00
Sebastian Husch Lee
1ee86b5041
fix: Fix filters to handle date times with timezones (loading and comparison) (#8800)
* Fix on date time parsing with timezones. And comparing naive and aware date times.

* Add release note

* Add more filter tests
2025-02-04 14:51:06 +01:00
Stefano Fiorucci
877f826da0
refactor: HF API Embedders - use InferenceClient.feature_extraction instead of InferenceClient.post (#8794)
* HF API Embedders: refactoring

* rename variables

* rm leftovers

* rm pin

* rm unused import

* relnote

* warning with truncate/normalize and serverless inference API

* test that warnings are raised
2025-02-03 15:11:16 +00:00
Sebastian Husch Lee
bba84e5517
fix: Fix JSONConverter to properly skip files that are not utf-8 encoded (#8775)
* Small fix

* Add reno

* Trying out license header fix here
2025-01-28 10:29:55 +01:00