3803 Commits

Author SHA1 Message Date
Stefano Fiorucci
2440a5ee17
chore:PyPDFToDocument - deprecate converter init parameter (#8569)
* deprecat converter in pypdf

* fix linting of MetaFieldGroupingRanker

* linting
2024-11-26 14:47:04 +01:00
Haystack Bot
3d95e06722
chore: Update unstable version to 2.9.0-rc0 (#8582)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-11-26 16:42:12 +05:00
Michele Pangrazzi
f0c3692cf2
Remove is_greedy deprecated argument from @component decorator (#8580)
* Remove 'is_greedy' deprecated argument from @component decorator

* Remove unused import
v2.9.0-rc0
2024-11-26 10:44:50 +00:00
Vladimir Blagojevic
59f1e182db
feat: Add variable to specify inputs as optional to ConditionalRouter (#8568)
* Add optional_variables in ConditionalRouter

* Add reno note

* Add more unit test with various complex scenarios

* Add more unit tests

* Add pylint disable=too-many-positional-arguments

* PR feedback from @sjrl
2024-11-26 10:48:55 +01:00
Matt G
e3b73e048b
fix: bug on tracing where components are in a loop in a pipeline (#8576)
* Fix to tracing parent spans on loops

* Fix linting

* Add release notes

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-25 14:21:08 +01:00
Silvano Cerza
ab840351f8
Fix DocumentCleaner not preserving Document fields (#8578) 2024-11-25 13:08:59 +01:00
Amna Mubashar
9302d3d9f0
feat: Add store_full_path to converters (2/3) (#8573) 2024-11-25 15:22:19 +05:00
Sebastian Husch Lee
eace2a99e5
feat: Add Literal["*"] option to required_variables in ChatPrompBuilder and PromptBuilder (#8572)
* Add new option for required_variables in PromptBuilder and ChatPromptBuilder

* Add reno note

* Add tests
2024-11-22 16:27:50 +01:00
David S. Batista
b5a2fad642
feat: adding Maximum Margin Relevance Ranker (#8554)
* initial import

* linting

* adding MRR tests

* adding release notes

* fixing tests

* adding linting ignore to cross-encoder ranker

* update docstring

* refactoring

* making strategy Optional instead of Literal

* wip: adding unit tests

* refactoring MMR algorithm

* refactoring tests

* cleaning up and updating tests

* adding empty line between license + code

* bug in tests

* using Enum for strategy and similarity metric

* adding more tests

* adding empty line between license + code

* removing run time params

* PR comments

* PR comments

* fixing

* fixing serialisation

* fixing serialisation tests

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/sentence_transformers_diversity.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* fixing tests

* PR comments

* PR comments

* PR comments

* PR comments

---------

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-22 14:58:45 +00:00
Richard Hudson
a8eeb2024f
feat: Allow unverified OpenAPI calls (#8562)
* Feed through ssl_verify value to OpenAPI

* Add release note

* Update serialization methods

* Applied black formatting
2024-11-22 15:45:00 +01:00
Amna Mubashar
4e6c7967d9
fix: Remove deprecated Legacy Filter Tests (#8567) 2024-11-22 14:39:22 +01:00
Amna Mubashar
21906d0558
feat: Add store_full_path to converters (1/3) (#8566)
* Add store_full_path param to 3 converters
2024-11-22 13:55:08 +01:00
Stefano Fiorucci
0b2a299378
fix linting of MetaFieldGroupingRanker (#8570) 2024-11-22 13:37:04 +01:00
Bilge Yücel
5cc2df16ee
Update the Studio part on README.md (#8564) 2024-11-21 12:25:35 +03:00
Ulises M
b1353f4f0f
fix: append runtime meta to ChatMessage's extracted meta in AnswerBuilder (#8544)
* append runtime meta to extracted meta

* add pylint ignore flag to .run()

* explicitly convert reply to string
2024-11-20 20:07:04 +01:00
Julian Risch
2cc45dd5b9
docs: Update discord invite link in README.md (#8560) 2024-11-20 16:27:31 +01:00
Silvano Cerza
3ef8c081be
fix: OpenAIChatGenerator and OpenAIGenerator crashing when streaming with usage tracking (#8558)
* Fix OpenAIGenerator crashing with tracking usage with streaming enabled

* Fix OpenAIChatGenerator crashing with tracking usage with streaming enabled

* Add release notes

* Fix linting
2024-11-20 10:27:22 +01:00
Daria Fokina
3a30ee352e
Update component name in docsrtrings (#8559) 2024-11-19 17:25:15 +01:00
Sebastian Husch Lee
14895f6573
chore: Use token instead of use_auth_token because of deprecation warning (#8552)
* Use token instead of use_auth_token because of deprecation warning

* Fix test

* pylint

* fix linting

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-11-18 11:58:22 +00:00
Silvano Cerza
bd77120cf3
Fix DocumentSplitter not splitting by function (#8549)
* Fix DocumentSplitter not splitting by function

* Make the split_by mapping a constant
2024-11-18 11:54:30 +01:00
Silvano Cerza
cea1e3f07e
Add Zed config folder to .gitignore (#8550) 2024-11-15 17:36:03 +01:00
Ivo Bellin Salarin
c78545dfc0
feat(openai): be tolerant to exceptions (#8526)
* feat: be tolerant to exceptions

if ever an error is raised by the OpenAI API, don't fail the entire processing

* fix: missing import, string separator

* Enhance error handling

* Use batched from more_itertools for compatibility with older Python versions

* Fix batching and add test

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-15 10:52:44 +01:00
Stefano Fiorucci
f085959067
chore: declare requires-python<3.13 in pyproject (#8547)
* restrict to python<3.13

* try unpinning dulwich

* reintroduce dulwich pin
2024-11-15 09:28:39 +00:00
Sebastian Husch Lee
e45d3329a1
feat: Adding DALLE image generator (#8448)
* First pass at adding DALLE image generator

* Add missing header

* Fix tests

* Add tests

* Fix mypy

* Make mypy happy

* More unit tests

* Adding release notes

* Add a test for run

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* Fix pylint

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/generators/openai_dalle.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Amna Mubashar <amnahkhan.ak@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-14 16:19:49 +01:00
Sriniketh J
a045c0eabb
feat: added split by line to DocumentSplitter (#8525)
* feat: added split by line to DocumentSplitter

* fix: pr review comments

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-11-14 16:09:01 +01:00
Sebastian Husch Lee
0c11c7b98e
fix: Bring in fix from custom nodes (#8539)
* Bring in fix from custom nodes

* Add to_dict function and test

* reno

* Fix pylint
2024-11-14 13:00:28 +01:00
Anes Benmerzoug
f5683bc8fa
fix: document joiner division by zero with distribution based rank fusion (#8520)
* Parametrize document joiner tests with empty lists

* Skip loop in _distribution_based_rank_fusion if document list is empty

* Parametrize test_empty_list with join_mode

* Prevent division by zero in _merge and _reciprocal_rank_fusion

* Add release notes

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-11-14 11:41:28 +00:00
David S. Batista
e5a80722c2
feat: adding metadata grouper component (#8512)
* initial import

* making tests more readable; adding docstring

* adding release notes

* adding LICENSE header

* Update test/components/rankers/test_metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* refactoring

* fixing docstring

* fixing types

* test docstrings

* renaming test

* handling too-many-arguments

* liting

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* changing name

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* Update haystack/components/rankers/metadata_grouper.py

Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>

* assiging value inside function for re-use

* improving docstring

* updating name to MetaFieldGroupingRanker

* adding to pydocs

* fixing imports

* adding output docstring

* Update haystack/components/rankers/meta_field_grouper_ranker.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/rankers/__init__.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update releasenotes/notes/add-metadata-grouper-21ec05fd4a307425.yaml

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update test/components/rankers/test_metadata_grouper.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* update docstring tests

* fixing imports

* rename modules for consistency

* fix pydocs

* simplification + more tests

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
2024-11-12 16:01:53 +01:00
David S. Batista
fcdf392bfb
fix: huggingface embedder error messages not being displayed (#8537)
* initial import

* fixing too-many-arguments - since file is now changed pylint checks it

* more fixes

* disable too-arguments pylint
2024-11-12 14:59:20 +01:00
David S. Batista
852900d5e3
lowercase comparision (#8532) 2024-11-11 16:33:54 +01:00
Silvano Cerza
ebb45d3d1e
Remove ddtrace version pin (#8529) 2024-11-11 11:21:10 +01:00
Sebastian Husch Lee
911f3523ab
feat: Increase logging transparency for empty Documents during conversion (#8509)
* Add log lines for PDF conversion and make skipping more explicit in DocumentSplitter

* Add logging statement for PDFMinerToDocument as well

* Add tests

* Remove unused line

* Remove unused line

* add reno

* Add in PDF file

* Update checks in PDF converters and add tests for document splitter

* Revert

* Remove line

* Fix comment

* Make mypy happy

* Make mypy happy
2024-11-04 09:26:57 +01:00
Bohan Qu
2595e68050
feat: Add TTFT support in OpenAI chat generator (#8444)
* feat: Add TTFT support in OpenAI generators

* pylint fixes

* correct disable

---------

Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
2024-10-31 16:56:17 +01:00
Sebastian Husch Lee
294a67e426
feat: Adding StringJoiner (#8357)
* Adding StringJoiner

* Release notes

* Remove typing

* Remove unused import

* Try to fix header

* Fix one test

* Add to docs, move test to behavioral pipeline test

* Undo changes

* Fix test

* Update haystack/components/joiners/string_joiner.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Update haystack/components/joiners/string_joiner.py

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

* Provide usage example

* Apply suggestions from code review

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>

---------

Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-10-30 15:03:41 +00:00
Silvano Cerza
3e30b1db7e Revert afdeaf5a6 and 8271b2d80 2024-10-30 15:41:30 +01:00
Silvano Cerza
8271b2d80b Fix readme_sync.yml unstable version check 2024-10-30 15:34:53 +01:00
Silvano Cerza
afdeaf5a62
Fix readme_sync.yml (#8506) 2024-10-30 15:27:40 +01:00
Haystack Bot
f3fda66861
Update unstable version to 2.8.0-rc0 (#8503)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
2024-10-30 11:54:49 +01:00
Stefano Fiorucci
700684a31c
fix: HuggingFaceAPIGenerator - use forward references (#8502)
* hf API generator: forward references + refactor

* release note
v2.8.0-rc0
2024-10-30 11:51:07 +01:00
Silvano Cerza
8a35e792b9
feat: Add route output type validation in ConditionalRouter (#8500) 2024-10-29 18:06:54 +01:00
Madeesh Kannan
33675b4caf
chore: Remove deprecated DefaultConverter for PyPDFToDocument (#8501)
* chore: Remove deprecated `DefaultConverter` for `PyPDFToDocument`

* Remove unused imports

---------

Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-10-29 16:42:48 +00:00
Bohan Qu
081b143aae
feat!: tracing with concurrency (#8489) 2024-10-29 17:39:41 +01:00
Stefano Fiorucci
2045f6f16a
try test jsonschema (#8496) 2024-10-29 16:21:51 +01:00
Vladimir Blagojevic
28161f7bb9
feat: DOCXToDocument: add table extraction (#8457)
* DOCXToDocument: add table extraction

* Add reno note

* mypy fixes

* add unit tests

* Add csv table support

* Update release note

* Add TableFormat enum

* Add table_format as str init param

* Update docx.py

Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>

* PR feedback

* PR feedback

---------

Co-authored-by: medsriha <medsriha@gmail.com>
Co-authored-by: Mo Sriha <22803208+medsriha@users.noreply.github.com>
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-10-29 16:20:27 +01:00
Silvano Cerza
8205724395
feat: Rework Pipeline.run() to better handle cycles (#8431)
* draft

* Enhance

* Almost works

* Simplify some parts and handle intermediate outputs

* Handle connections with default

* Handle cycles with multiple connections from two components

* Update distributed outputs at the correct time

* Remove Component inputs after it runs

* Add agent pipeline test case

* Fix infite loop test

* Handle some corner cases with loops checking and inputs deletion

* Fix tests

* Add new behavioral test

* Remove unused code in behavioural test

* Fix behavioural test

* Fix max run check

* Simplify outputs distribution

* Simplify subgraph run check

* Remove unused _init_run_queue function

* Remove commented code

* Add some missing type hints

* Simplify cycles breaking

* Fix _distribute_output test

* Fix _find_components_that_will_receive_no_input test

* Fix validation test

* Fix tracer losing Component inputs

* Fix some linting issues

* Remove ignore pylint rule

* Rename method that break cycles and make it raise

* Add docstring to _run_subgraph

* Update Pipeline.run() docstring

* Update comment to clarify cycles execution

* Remove SelfLoop sample Component

* Add behavioural test for unsupported cycles

* Rename behavioural test to be more specific

* Add new behavioural test

* Add release notes

* Remove commented out code and random pass

* Use more efficient function to find cycles

* Simplify _break_supported_cycles_in_graph by using defaultdict

* Stop breaking edges as soon as we make the graph acyclic

* Fix docstring and add some more comments

* Fix _distribute_output docstring

* Fix _find_receivers_from docstring

* More detailed release notes

* Minimize calls to networkx.is_directed_acyclic_graph

* Add some more info on edges keys

* Adjust components_in_cycles comment

* Add new Pipeline behavioural test

* Enhance _find_components_that_will_receive_no_input to cover more cases

* Explain why run_queue is reset after running a subgraph cycle

* Rename _init_inputs_state to _normalize_input_data

* Better explain the subgraph output distribution

* Remove for else

* Fix some comments and docstrings

* Fix linting

* Add missing return type

* Fix typo

* Rename _normalize_input_data to _normalize_varidiac_input_data and add more documentation

* Remove unused import

---------

Co-authored-by: Sebastian Husch Lee <sjrl423@gmail.com>
2024-10-29 15:43:16 +01:00
tstadel
d430833f8f
feat: streaming_callback as run param from HF generators (#8406)
* feat: streaming_callback as run param from HF generators

* apply feedback

* add reno

* fix test

* fix test

* fix mypy

* fix excessive linting rule
2024-10-29 15:32:06 +01:00
Amna Mubashar
811a54a3ef
Fix filter syntax validation (#8477) 2024-10-29 13:21:49 +01:00
Stefano Fiorucci
c7b898994e
build: unpin numpy + use Python 3.9 in CI (#8492)
* try unpinning numpy

* try python 3.9

* release note
2024-10-28 12:15:17 +01:00
Stefano Fiorucci
78292422f0
feat: allow passing meta in the run method of FileTypeRouter (#8486)
* initial refactoring

* progress

* refinements

* serde methods + tests

* release note

* comment

* make additional_mimetypes internal attribute
2024-10-24 16:21:15 +02:00
Bilge Yücel
c24814c6f2
Add info to README for installation from main branch (#8484)
* Update some old links
2024-10-24 13:46:18 +03:00