331 Commits

Author SHA1 Message Date
Michele Pangrazzi
b32f85cca2
remove deprecated 'converter' init parameter from PyPDFToDocument component (#8609) 2024-12-06 15:43:43 +01:00
David S. Batista
2282c26f17
feat!: SentenceWindowRetriever returns List[Document] with docs ordered by split_idx_start (#8590)
* initial import

* adding a few pylint disable

* adding tests

* fixing integration tests

* adding release notes

* fixing types and docstrings
2024-12-04 16:55:56 +01:00
Amna Mubashar
4c8eb54049
feat: Add store_full_path to converters (3/3) (#8585)
* Add store_full_path params
2024-12-03 13:48:56 +05:00
Stefano Fiorucci
c8685aa141
refactor: update components to access ChatMessage.text instead of content (#8589)
* introduce text property and deprecate content

* release note

* use chatmessage.text

* release note

* linting
2024-11-28 10:16:07 +00:00
Stefano Fiorucci
51c1390426
chore: use class methods to create ChatMessage (#8581)
* use class methods to build messages

* fix failing format
2024-11-28 09:35:24 +00:00
Stefano Fiorucci
fb42c035c5
feat: PyPDFToDocument - add new customization parameters (#8574)
* deprecat converter in pypdf

* fix linting of MetaFieldGroupingRanker

* linting

* pypdftodocument: add customization params

* fix mypy

* incorporate feedback
2024-11-26 16:37:59 +01:00
Vladimir Blagojevic
59f1e182db
feat: Add variable to specify inputs as optional to ConditionalRouter (#8568)
* Add optional_variables in ConditionalRouter

* Add reno note

* Add more unit test with various complex scenarios

* Add more unit tests

* Add pylint disable=too-many-positional-arguments

* PR feedback from @sjrl
2024-11-26 10:48:55 +01:00
Silvano Cerza
ab840351f8
Fix DocumentCleaner not preserving Document fields (#8578) 2024-11-25 13:08:59 +01:00
Amna Mubashar
9302d3d9f0
feat: Add store_full_path to converters (2/3) (#8573) 2024-11-25 15:22:19 +05:00
Sebastian Husch Lee
eace2a99e5
feat: Add Literal["*"] option to required_variables in ChatPrompBuilder and PromptBuilder (#8572)
* Add new option for required_variables in PromptBuilder and ChatPromptBuilder

* Add reno note

* Add tests
2024-11-22 16:27:50 +01:00
David S. Batista
b5a2fad642
feat: adding Maximum Margin Relevance Ranker (#8554)
* initial import

* linting

* adding MRR tests

* adding release notes

* fixing tests

* adding linting ignore to cross-encoder ranker

* update docstring

* refactoring

* making strategy Optional instead of Literal

* wip: adding unit tests

* refactoring MMR algorithm

* refactoring tests

* cleaning up and updating tests

* adding empty line between license + code

* bug in tests

* using Enum for strategy and similarity metric

* adding more tests

* adding empty line between license + code

* removing run time params

* PR comments

* PR comments

* fixing

* fixing serialisation

* fixing serialisation tests

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* fixing tests

* PR comments

* PR comments

* PR comments

* PR comments

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-22 14:58:45 +00:00
Richard Hudson
a8eeb2024f
feat: Allow unverified OpenAPI calls (#8562)
* Feed through ssl_verify value to OpenAPI

* Add release note

* Update serialization methods

* Applied black formatting
2024-11-22 15:45:00 +01:00
Amna Mubashar
21906d0558
feat: Add store_full_path to converters (1/3) (#8566)
* Add store_full_path param to 3 converters
2024-11-22 13:55:08 +01:00
Ulises M
b1353f4f0f
fix: append runtime meta to ChatMessage's extracted meta in AnswerBuilder (#8544)
* append runtime meta to extracted meta

* add pylint ignore flag to .run()

* explicitly convert reply to string
2024-11-20 20:07:04 +01:00
Silvano Cerza
3ef8c081be
fix: OpenAIChatGenerator and OpenAIGenerator crashing when streaming with usage tracking (#8558)
* Fix OpenAIGenerator crashing with tracking usage with streaming enabled

* Fix OpenAIChatGenerator crashing with tracking usage with streaming enabled

* Add release notes

* Fix linting
2024-11-20 10:27:22 +01:00
Sebastian Husch Lee
14895f6573
chore: Use token instead of use_auth_token because of deprecation warning (#8552)
* Use token instead of use_auth_token because of deprecation warning

* Fix test

* pylint

* fix linting

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-11-18 11:58:22 +00:00
Silvano Cerza
bd77120cf3
Fix DocumentSplitter not splitting by function (#8549)
* Fix DocumentSplitter not splitting by function

* Make the split_by mapping a constant
2024-11-18 11:54:30 +01:00
Ivo Bellin Salarin
c78545dfc0
feat(openai): be tolerant to exceptions (#8526)
* feat: be tolerant to exceptions

if ever an error is raised by the OpenAI API, don't fail the entire processing

* fix: missing import, string separator

* Enhance error handling

* Use batched from more_itertools for compatibility with older Python versions

* Fix batching and add test

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-15 10:52:44 +01:00
Sebastian Husch Lee
e45d3329a1
feat: Adding DALLE image generator (#8448)
* First pass at adding DALLE image generator

* Add missing header

* Fix tests

* Add tests

* Fix mypy

* Make mypy happy

* More unit tests

* Adding release notes

* Add a test for run

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Fix pylint

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-14 16:19:49 +01:00
Sriniketh J
a045c0eabb
feat: added split by line to DocumentSplitter (#8525)
* feat: added split by line to DocumentSplitter

* fix: pr review comments

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-11-14 16:09:01 +01:00
Sebastian Husch Lee
0c11c7b98e
fix: Bring in fix from custom nodes (#8539)
* Bring in fix from custom nodes

* Add to_dict function and test

* reno

* Fix pylint
2024-11-14 13:00:28 +01:00
Anes Benmerzoug
f5683bc8fa
fix: document joiner division by zero with distribution based rank fusion (#8520)
* Parametrize document joiner tests with empty lists

* Skip loop in _distribution_based_rank_fusion if document list is empty

* Parametrize test_empty_list with join_mode

* Prevent division by zero in _merge and _reciprocal_rank_fusion

* Add release notes

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-14 11:41:28 +00:00
David S. Batista
e5a80722c2
feat: adding metadata grouper component (#8512)
* initial import

* making tests more readable; adding docstring

* adding release notes

* adding LICENSE header

* Update test/components/rankers/test_metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* refactoring

* fixing docstring

* fixing types

* test docstrings

* renaming test

* handling too-many-arguments

* liting

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* changing name

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* assiging value inside function for re-use

* improving docstring

* updating name to MetaFieldGroupingRanker

* adding to pydocs

* fixing imports

* adding output docstring

* Update haystack/components/rankers/meta_field_grouper_ranker.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/rankers/__init__.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update releasenotes/notes/add-metadata-grouper-21ec05fd4a307425.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/rankers/test_metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* update docstring tests

* fixing imports

* rename modules for consistency

* fix pydocs

* simplification + more tests

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-12 16:01:53 +01:00
David S. Batista
852900d5e3
lowercase comparision (#8532) 2024-11-11 16:33:54 +01:00
Sebastian Husch Lee
911f3523ab
feat: Increase logging transparency for empty Documents during conversion (#8509)
* Add log lines for PDF conversion and make skipping more explicit in DocumentSplitter

* Add logging statement for PDFMinerToDocument as well

* Add tests

* Remove unused line

* Remove unused line

* add reno

* Add in PDF file

* Update checks in PDF converters and add tests for document splitter

* Revert

* Remove line

* Fix comment

* Make mypy happy

* Make mypy happy
2024-11-04 09:26:57 +01:00
Bohan Qu
2595e68050
feat: Add TTFT support in OpenAI chat generator (#8444)
* feat: Add TTFT support in OpenAI generators

* pylint fixes

* correct disable

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-10-31 16:56:17 +01:00
Sebastian Husch Lee
294a67e426
feat: Adding StringJoiner (#8357)
* Adding StringJoiner

* Release notes

* Remove typing

* Remove unused import

* Try to fix header

* Fix one test

* Add to docs, move test to behavioral pipeline test

* Undo changes

* Fix test

* Update haystack/components/joiners/string_joiner.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/joiners/string_joiner.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Provide usage example

* Apply suggestions from code review

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-10-30 15:03:41 +00:00
Silvano Cerza
8a35e792b9
feat: Add route output type validation in ConditionalRouter (#8500) 2024-10-29 18:06:54 +01:00
Madeesh Kannan
33675b4caf
chore: Remove deprecated DefaultConverter for PyPDFToDocument (#8501)
* chore: Remove deprecated `DefaultConverter` for `PyPDFToDocument`

* Remove unused imports

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-10-29 16:42:48 +00:00
Vladimir Blagojevic
28161f7bb9
feat: DOCXToDocument: add table extraction (#8457)
* DOCXToDocument: add table extraction

* Add reno note

* mypy fixes

* add unit tests

* Add csv table support

* Update release note

* Add TableFormat enum

* Add table_format as str init param

* Update docx.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* PR feedback

* PR feedback

---------

Co-authored-by: medsriha <medsriha@gmail.com>
Co-authored-by: Mo Sriha <22803208+medsriha@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-10-29 16:20:27 +01:00
tstadel
d430833f8f
feat: streaming_callback as run param from HF generators (#8406)
* feat: streaming_callback as run param from HF generators

* apply feedback

* add reno

* fix test

* fix test

* fix mypy

* fix excessive linting rule
2024-10-29 15:32:06 +01:00
Stefano Fiorucci
78292422f0
feat: allow passing meta in the run method of FileTypeRouter (#8486)
* initial refactoring

* progress

* refinements

* serde methods + tests

* release note

* comment

* make additional_mimetypes internal attribute
2024-10-24 16:21:15 +02:00
Alper
a556e11bf1
fix: window_size set during run instead of construction (#8463)
* window_size set during runtime

* revert init and update run with window_size

* improved doc, removed print

* adding release notes

* updating tests

* reverting docstring example

* Update haystack/components/retrievers/sentence_window_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/components/retrievers/sentence_window_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Update haystack/components/retrievers/sentence_window_retriever.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-10-22 14:01:26 +00:00
David S. Batista
3a50d35f06
feat: allow Generators to run with a system prompt defined at run time (#8423)
* initial import

* Update haystack/components/generators/openai.py

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* docs: fixing

* supporting the three use cases: no system prompt, using system prompt defined at init, using system prompt defined at run time

* renaming 'run_time_system_prompt' to 'system_prompt'

* adding tests, converting methods to static

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-10-22 11:21:10 +02:00
Ajit Singh
6cf13e8b98
enhancement: reduced usage of numpy and substituted built-in libraries (#8418)
* reduced usage of numpy and substituted built-in libraries

* added release note

* edited expit function to support both float as well as list (this case was giving error CI)

* revert code , numpy can't be removed here

* more cleaning

* fix relnote

---------

Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
2024-10-18 15:42:19 +02:00
Stefano Fiorucci
dfd339ca2d
ensure compatibility with huggingface_hub==0.26.0 (#8464) 2024-10-18 08:38:48 +00:00
Alper
b40f0c8b5d
feat: SentenceTransformersTextEmbedder supports config_kwargs (#8432)
* add config_kwargs

* disable PLR0913 for a specific function

* add a release note

* refer to AutoConfig in config_kwargs docstring

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: Julian Risch <julianrisch@gmx.de>
2024-10-14 16:08:53 +00:00
David S. Batista
b81abc0c85
feat: SentenceTransformersDocumentEmbedder supports config_kwargs (#8433)
* initial import

* adding release notes
2024-10-14 17:43:04 +02:00
David S. Batista
5867fa1f34
fix: whisper transcription test use github url + update test (#8455)
* adding audio file

* changing URL

* updating tests

* temporary removing failing test

* updating tests

* removing failing test

* typo

* linting

* fixing URL

* updating tests
2024-10-14 16:24:52 +02:00
David S. Batista
a50593ede0
fix: whisper tests using audio file from our github repo (#8454)
* adding audio file

* temporary removing failing test

* removing failing test
2024-10-14 12:56:37 +02:00
Madeesh Kannan
e7bfd80f3b
fix: (Temporarily) Re-add suport for pre-2.6.0 YAMLs with PyPDFConverter (#8443) 2024-10-08 14:35:43 +02:00
Madeesh Kannan
ee89f6ad57
fix: PyPDFToDocument correctly serializes custom converters, deprecate DefaultConverter (#8430)
* fix: `PyPDFToDocument` correctly serializes custom converters, deprecate `DefaultConverter`

* Remove `auto` prefix from serde util function names, add unit tests
2024-10-01 16:35:38 +02:00
Julian Risch
08686d90af
feat: Add DocumentNDCGEvaluator component (#8419)
* draft new component and tests

* draft new component and tests

* fix tests, replace usage of get_attr

* improve docstrings, refactor tests

* add test for mixed documents w/wo scores

* add test with multiple lists and update docstring

* validate inputs, add tests, make methods static

* change fallback to binary relevance

* rename validate_init_parameters to validate_inputs
2024-10-01 16:15:02 +02:00
Silvano Cerza
d6f073f9b3
Revert "fix: make pypdf converter more robust (#8427)" (#8428)
This reverts commit d234c75168dcb49866a6714aa232f37d56f72cab.
2024-10-01 11:55:25 +02:00
Tobias Wochinger
d234c75168
fix: make pypdf converter more robust (#8427)
* fix: make `from_dict` of `PyPDFToDocument` more robust

* chore: drop trailing space

* converting method to static and making the comment shorter

* reverting method to static

---------

Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-09-30 16:47:23 +00:00
Silvano Cerza
29672d4b42
feat: Add JSONConverter Component (#8397)
* Add JSONConverter Component

* Handle some corner cases

* Add JSONConverter to pydoc config

* Add a way to extract all non content fields as metadata

* Small fix in docstring

* Fix tests

* docstrings upd

* Update json.py

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-09-25 12:34:51 +02:00
Vladimir Blagojevic
09b95746a2
feat: HuggingFaceAPIChatGenerator add token usage data (#8375)
* Ensure HuggingFaceAPIChatGenerator has token usage data

* Add reno note

* Fix release note
2024-09-23 15:40:50 +02:00
Sriniketh J
066e2e3ec5
Make api_key param optional in LLMEvaluator (#8340) 2024-09-20 10:47:13 +02:00
Sebastian Husch Lee
2235ce673f
test: Move pipeline test to behavorials (#8377) 2024-09-19 16:59:35 +02:00
Vladimir Blagojevic
514e0abc39
fix: Fix nltk imports (#8381) 2024-09-18 11:25:21 +00:00