1524 Commits

Author SHA1 Message Date
Sebastian Husch Lee
28ad78c73d
feat: Add XLSXToDocument converter (#8522)
* Add draft of the Excel To Document converter

* Add license header

* Add release note

* Use Union instead of pipe

* Add openpyxl as additional dep

* Fix zip issue

* few updates from Bijay

* Update deps

* Add markdown test

* Adding more example excels and expanding tests

* Added more tests

* Fix windows test by setting lineterminator

* Addressing PR comments

* PR comments

* Fix linting
2025-01-09 09:03:19 +01:00
Stefano Fiorucci
bc30105fbc
test: reorganize docstore test suite to isolate dataframe tests (#8684)
* reorganize docstore test suite to isolate dataframe tests

* improve docstring

* include FilterDocumentsTestWithDataframe in InMemoryDocumentStore tests
2025-01-08 14:58:52 +00:00
Stefano Fiorucci
5539f6c33f
refactor: improve serialization/deserialization of callables (to handle class methods and static methods) (#8683)
* progress

* refinements

* tidy up

* release note
2025-01-08 11:28:00 +01:00
tstadel
e6059e632e
fix: truncate ByteStream string representation (#8673)
* fix: truncate ByteStream string representation

* add reno

* better reno

* add test

* Update test_byte_stream.py

* apply feedback

* update reno
2025-01-07 19:00:52 +01:00
Bohan Qu
8e3f64717f
feat: use importlib when deserializing callables (#8648) 2025-01-03 15:06:58 +01:00
Stefano Fiorucci
7b4d9ba86e
feat: introduce class method to create ChatMessage from the OpenAI dictionary format (#8670)
* add ChatMessage.from_openai_dict_format

* remove print

* release note

* improve docstring

* separate validation logic

* rm obvious comment
2025-01-02 10:34:41 +00:00
Stefano Fiorucci
188b2a7f06
feat: support for tools in OpenAIChatGenerator (#8666)
* move chatmsg>openai conversion to chatmsg dataclass

* implementation and tests cleanup

* release note

* try fixing azure chat generator

* add serde test for toolinvoker

* small fix
2024-12-20 14:20:54 +00:00
Stefano Fiorucci
7dcbf25bd7
feat: add Tool Invoker component (#8664)
* port toolinvoker

* release note
2024-12-20 14:02:42 +01:00
Michele Pangrazzi
c192488bf6
Named entity extractor private models (#8658)
* add 'token' support to NamedEntityExtractor to enable using private models on HF backend

* fix existing error message format

* add release note

* add HF_API_TOKEN to e2e workflow

* add informative comment

* Updated to_dict / from_dict to handle 'token' correctly ; Added tests

* Fix lint

* Revert unwanted change
2024-12-20 11:15:55 +01:00
Sebastian Husch Lee
286061f005
fix: Move potential nltk download to warm_up (#8646)
* Move potential nltk download to warm_up

* Update tests

* Add release notes

* Fix tests

* Uncomment

* Make mypy happy

* Add RuntimeError message

* Update release notes

---------

Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-12-20 10:41:44 +01:00
Stefano Fiorucci
f4d9c2bb91
fix: Make the HuggingFaceLocalChatGenerator compatible with the new ChatMessage; serialize chat_template (#8663)
* message conversion function

* hfapi w tools

* right test file + hf_hub version

* release note

* fix for new chatmessage; serialize chat_template

* feedback
2024-12-19 15:12:12 +01:00
Stefano Fiorucci
2bc58d2987
feat: support for tools in HuggingFaceAPIChatGenerator (#8661)
* message conversion function

* hfapi w tools

* right test file + hf_hub version

* release note

* feedback
2024-12-19 15:04:37 +01:00
David S. Batista
c306bee665
fix: adding missing abbreviations files for SentenceSplitter (#8660)
* adding missing abbreviations files for SentenceSplitter

* fixing tests path
2024-12-19 11:08:29 +01:00
Stefano Fiorucci
96b4a1d2fd
feat: Tool dataclass - unified abstraction to represent tools (#8652)
* draft

* del HF token in tests

* adaptations

* progress

* fix type

* import sorting

* more control on deserialization

* release note

* improvements

* support name field

* fix chatpromptbuilder test

* port Tool from experimental

* release note

* docs upd

* Update tool.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-18 11:36:44 +00:00
Stefano Fiorucci
ea3602643a
feat!: new ChatMessage (#8640)
* draft

* del HF token in tests

* adaptations

* progress

* fix type

* import sorting

* more control on deserialization

* release note

* improvements

* support name field

* fix chatpromptbuilder test

* Update chat_message.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-12-17 17:02:04 +01:00
Stefano Fiorucci
f2b5f123b3
del HF token in tests (#8634) 2024-12-13 09:50:23 +01:00
David S. Batista
3f77d3ab6c
!feat: unify NLTKDocumentSplitter and DocumentSplitter (#8617)
* wip: initial import

* wip: refactoring

* wip: refactoring tests

* wip: refactoring tests

* making all NLTKSplitter related tests work

* refactoring

* docstrings

* refactoring and removing NLTKDocumentSplitter

* fixing tests for custom sentence tokenizer

* fixing tests for custom sentence tokenizer

* cleaning up

* adding release notes

* reverting some changes

* cleaning up tests

* fixing serialisation and adding tests

* cleaning up

* wip

* renaming and cleaning

* adding NLTK files

* updating docstring

* adding import to init

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* updating tests

* wip

* adding sentence/period change warning

* fixing LICENSE header

* Update haystack/components/preprocessors/document_splitter.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-12 14:22:27 +00:00
Michele Pangrazzi
21d53d0ec6
update default value of 'store_full_path' to False in converters (#8619) 2024-12-10 16:03:38 +01:00
David S. Batista
248dccbdd3
chore: fixing pylint issues (#8610)
* initial import

* fixing internal methods

* fixing some internal methods

* modify _preprocess

* fixed internal methods

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-12-09 16:53:37 +00:00
Anton Pelykh
6f983a22ca
fix: add missing stream mime type assignment to the LinkContentFetcher (#8596)
* add missing stream mime type assignment to the `LinkContentFetcher`

* fix release note fmt

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-12-09 14:51:14 +00:00
Michele Pangrazzi
b32f85cca2
remove deprecated 'converter' init parameter from PyPDFToDocument component (#8609) 2024-12-06 15:43:43 +01:00
David S. Batista
2282c26f17
feat!: SentenceWindowRetriever returns List[Document] with docs ordered by split_idx_start (#8590)
* initial import

* adding a few pylint disable

* adding tests

* fixing integration tests

* adding release notes

* fixing types and docstrings
2024-12-04 16:55:56 +01:00
Amna Mubashar
4c8eb54049
feat: Add store_full_path to converters (3/3) (#8585)
* Add store_full_path params
2024-12-03 13:48:56 +05:00
Stefano Fiorucci
de7099e560
ci: add job to check imports (#8594)
* try checking imports

* clarify error message

* better fmt

* do not show complete list of successfully imported packages

* refinements

* relnote

* add missing forward references

* better function name

* linting

* fix linting

* Update .github/utils/check_imports.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

---------

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-11-29 14:00:59 +00:00
Stefano Fiorucci
c8685aa141
refactor: update components to access ChatMessage.text instead of content (#8589)
* introduce text property and deprecate content

* release note

* use chatmessage.text

* release note

* linting
2024-11-28 10:16:07 +00:00
Stefano Fiorucci
fb1baf4921
refactor: ChatMessage - introduce text property and deprecate content (#8588)
* introduce text property and deprecate content

* release note

* minor test refactoring

---------

Co-authored-by: Michele Pangrazzi <xmikex83@gmail.com>
2024-11-28 09:53:02 +00:00
Stefano Fiorucci
51c1390426
chore: use class methods to create ChatMessage (#8581)
* use class methods to build messages

* fix failing format
2024-11-28 09:35:24 +00:00
Stefano Fiorucci
fb42c035c5
feat: PyPDFToDocument - add new customization parameters (#8574)
* deprecat converter in pypdf

* fix linting of MetaFieldGroupingRanker

* linting

* pypdftodocument: add customization params

* fix mypy

* incorporate feedback
2024-11-26 16:37:59 +01:00
Vladimir Blagojevic
59f1e182db
feat: Add variable to specify inputs as optional to ConditionalRouter (#8568)
* Add optional_variables in ConditionalRouter

* Add reno note

* Add more unit test with various complex scenarios

* Add more unit tests

* Add pylint disable=too-many-positional-arguments

* PR feedback from @sjrl
2024-11-26 10:48:55 +01:00
Silvano Cerza
ab840351f8
Fix DocumentCleaner not preserving Document fields (#8578) 2024-11-25 13:08:59 +01:00
Amna Mubashar
9302d3d9f0
feat: Add store_full_path to converters (2/3) (#8573) 2024-11-25 15:22:19 +05:00
Sebastian Husch Lee
eace2a99e5
feat: Add Literal["*"] option to required_variables in ChatPrompBuilder and PromptBuilder (#8572)
* Add new option for required_variables in PromptBuilder and ChatPromptBuilder

* Add reno note

* Add tests
2024-11-22 16:27:50 +01:00
David S. Batista
b5a2fad642
feat: adding Maximum Margin Relevance Ranker (#8554)
* initial import

* linting

* adding MRR tests

* adding release notes

* fixing tests

* adding linting ignore to cross-encoder ranker

* update docstring

* refactoring

* making strategy Optional instead of Literal

* wip: adding unit tests

* refactoring MMR algorithm

* refactoring tests

* cleaning up and updating tests

* adding empty line between license + code

* bug in tests

* using Enum for strategy and similarity metric

* adding more tests

* adding empty line between license + code

* removing run time params

* PR comments

* PR comments

* fixing

* fixing serialisation

* fixing serialisation tests

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* fixing tests

* PR comments

* PR comments

* PR comments

* PR comments

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-22 14:58:45 +00:00
Richard Hudson
a8eeb2024f
feat: Allow unverified OpenAPI calls (#8562)
* Feed through ssl_verify value to OpenAPI

* Add release note

* Update serialization methods

* Applied black formatting
2024-11-22 15:45:00 +01:00
Amna Mubashar
21906d0558
feat: Add store_full_path to converters (1/3) (#8566)
* Add store_full_path param to 3 converters
2024-11-22 13:55:08 +01:00
Ulises M
b1353f4f0f
fix: append runtime meta to ChatMessage's extracted meta in AnswerBuilder (#8544)
* append runtime meta to extracted meta

* add pylint ignore flag to .run()

* explicitly convert reply to string
2024-11-20 20:07:04 +01:00
Silvano Cerza
3ef8c081be
fix: OpenAIChatGenerator and OpenAIGenerator crashing when streaming with usage tracking (#8558)
* Fix OpenAIGenerator crashing with tracking usage with streaming enabled

* Fix OpenAIChatGenerator crashing with tracking usage with streaming enabled

* Add release notes

* Fix linting
2024-11-20 10:27:22 +01:00
Sebastian Husch Lee
14895f6573
chore: Use token instead of use_auth_token because of deprecation warning (#8552)
* Use token instead of use_auth_token because of deprecation warning

* Fix test

* pylint

* fix linting

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-11-18 11:58:22 +00:00
Silvano Cerza
bd77120cf3
Fix DocumentSplitter not splitting by function (#8549)
* Fix DocumentSplitter not splitting by function

* Make the split_by mapping a constant
2024-11-18 11:54:30 +01:00
Ivo Bellin Salarin
c78545dfc0
feat(openai): be tolerant to exceptions (#8526)
* feat: be tolerant to exceptions

if ever an error is raised by the OpenAI API, don't fail the entire processing

* fix: missing import, string separator

* Enhance error handling

* Use batched from more_itertools for compatibility with older Python versions

* Fix batching and add test

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-15 10:52:44 +01:00
Sebastian Husch Lee
e45d3329a1
feat: Adding DALLE image generator (#8448)
* First pass at adding DALLE image generator

* Add missing header

* Fix tests

* Add tests

* Fix mypy

* Make mypy happy

* More unit tests

* Adding release notes

* Add a test for run

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Fix pylint

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-14 16:19:49 +01:00
Sriniketh J
a045c0eabb
feat: added split by line to DocumentSplitter (#8525)
* feat: added split by line to DocumentSplitter

* fix: pr review comments

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-11-14 16:09:01 +01:00
Sebastian Husch Lee
0c11c7b98e
fix: Bring in fix from custom nodes (#8539)
* Bring in fix from custom nodes

* Add to_dict function and test

* reno

* Fix pylint
2024-11-14 13:00:28 +01:00
Anes Benmerzoug
f5683bc8fa
fix: document joiner division by zero with distribution based rank fusion (#8520)
* Parametrize document joiner tests with empty lists

* Skip loop in _distribution_based_rank_fusion if document list is empty

* Parametrize test_empty_list with join_mode

* Prevent division by zero in _merge and _reciprocal_rank_fusion

* Add release notes

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-14 11:41:28 +00:00
David S. Batista
e5a80722c2
feat: adding metadata grouper component (#8512)
* initial import

* making tests more readable; adding docstring

* adding release notes

* adding LICENSE header

* Update test/components/rankers/test_metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* refactoring

* fixing docstring

* fixing types

* test docstrings

* renaming test

* handling too-many-arguments

* liting

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* changing name

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* assiging value inside function for re-use

* improving docstring

* updating name to MetaFieldGroupingRanker

* adding to pydocs

* fixing imports

* adding output docstring

* Update haystack/components/rankers/meta_field_grouper_ranker.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/rankers/__init__.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update releasenotes/notes/add-metadata-grouper-21ec05fd4a307425.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/rankers/test_metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* update docstring tests

* fixing imports

* rename modules for consistency

* fix pydocs

* simplification + more tests

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-12 16:01:53 +01:00
David S. Batista
852900d5e3
lowercase comparision (#8532) 2024-11-11 16:33:54 +01:00
Sebastian Husch Lee
911f3523ab
feat: Increase logging transparency for empty Documents during conversion (#8509)
* Add log lines for PDF conversion and make skipping more explicit in DocumentSplitter

* Add logging statement for PDFMinerToDocument as well

* Add tests

* Remove unused line

* Remove unused line

* add reno

* Add in PDF file

* Update checks in PDF converters and add tests for document splitter

* Revert

* Remove line

* Fix comment

* Make mypy happy

* Make mypy happy
2024-11-04 09:26:57 +01:00
Bohan Qu
2595e68050
feat: Add TTFT support in OpenAI chat generator (#8444)
* feat: Add TTFT support in OpenAI generators

* pylint fixes

* correct disable

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-10-31 16:56:17 +01:00
Sebastian Husch Lee
294a67e426
feat: Adding StringJoiner (#8357)
* Adding StringJoiner

* Release notes

* Remove typing

* Remove unused import

* Try to fix header

* Fix one test

* Add to docs, move test to behavioral pipeline test

* Undo changes

* Fix test

* Update haystack/components/joiners/string_joiner.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/joiners/string_joiner.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Provide usage example

* Apply suggestions from code review

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-10-30 15:03:41 +00:00
Silvano Cerza
8a35e792b9
feat: Add route output type validation in ConditionalRouter (#8500) 2024-10-29 18:06:54 +01:00