Massimiliano Pippi
19a41a669e
docs: update contributor guidelines ( #7854 )
...
* update contributor guidelines
* remove occurences of core-integrations
* punctuation
* Apply suggestions from code review
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* link the contributions wanted repo
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
2024-06-14 08:45:32 +02:00
Stefano Fiorucci
8de639bd70
DocxDocument forward reference ( #7852 )
2024-06-13 11:29:31 +02:00
Carlos Fernández
c1c339923f
feat: add DocxToDocument converter ( #7838 )
...
* first fucntioning DocxFileToDocument
* fix lazy import message
* add reno
* Add license headder
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* change DocxFileToDocument to DocxToDocument
* Update library install to the maintained version
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* clan try-exvept to only take non haystack errors into account
* Add wanring on docstring of component ignoring page brakes, mark test as skip
* make warnings lazy evaluations
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* make warnings lazy evaluations
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Make warnings lazy evaluated
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Solve f bug
* Get more metadata from docx files
* add 'python-docx' dependency and docs
* Change logging import
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Fix typo
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* remake metadata extraction for docx
* solve bug regarding _get_docx_metadata method
* Update haystack/components/converters/docx.py
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Update haystack/components/converters/docx.py
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Delete unused test
---------
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-06-12 11:58:36 +02:00
Rob Pasternak
28dd0f5596
feat: Add options for what to do with missing metadata fields in MetaFieldRanker
( #7700 )
...
* Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation.
* Implement `missing_meta` functionality in `run()`.
* Finish first draft of revised `MetaFieldRanker` functionality.
* Add tests for `MetaFieldRanker` `missing_meta` functionality.
* Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation.
* Implement `missing_meta` functionality in `run()`.
* Finish first draft of revised `MetaFieldRanker` functionality.
* Add tests for `MetaFieldRanker` `missing_meta` functionality.
* Add release notes for new `missing_meta` param of `MetaFieldRanker`
* Move part of docs_missing_meta_field warning string outside of `if...elif...else`.
2024-06-12 10:42:02 +02:00
Silvano Cerza
14c7b02a4c
refactor: Isolate logic to check if a Component can run ( #7840 )
...
* Isolate run check
* Update docstrings and remove unnecessary set
* Rename argument
2024-06-11 16:14:04 +02:00
Silvano Cerza
58dd972d1a
refactor: Isolate code that runs single Pipeline Component ( #7837 )
...
* Isolate code that runs single Pipeline Component
* Fix mypy
2024-06-10 16:03:14 +00:00
Carlos Fernández
7fe0244258
feat: add methods to remove and replace components in a pipeline ( #7820 )
...
* add remove_component method plus unit tests
* add docstrings
* add reno
* add type annotation to remove_component method
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* solve bug not allowing a component to be reatached to a pipeline after being removed
* Properly remove Component from Pipeline
* Ignore mypy
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-06-10 14:54:07 +02:00
Alejandro Lazaro
639ee598fd
ci: only run under deepset-ai GitHub org ( #7834 )
...
Since the workflow doesn't seem intended for forks, I decided to
disable it altogether for runs outside `deepset-ai` org.
Fix #7833
2024-06-10 10:51:41 +02:00
Luis Guimarais
48fd8865e5
Added NOS Portugal to list of companies in README.md ( #7832 )
2024-06-08 14:48:54 +02:00
Madeesh Kannan
63226dad34
fix: Fix LLMEvaluator
serialization ( #7818 )
...
* fix: Fix `LLMEvaluator` serialization
* `reno`
2024-06-07 12:49:23 +02:00
Sebastian Husch Lee
2c2c7c9f56
feat: Add PPTXToDocument converter ( #7808 )
...
* Add first pass at PPTXToDocument converter
* Add test and update code
* Add doc string
* Update docstrings
* Add release notes
* remove unused imports, add to api docs, update pyproject.toml
* Add a new test
* Add dep so tests can run
2024-06-07 09:43:29 +00:00
David S. Batista
276ff3c104
test evaluation pipeline failing ( #7823 )
2024-06-07 11:26:18 +02:00
Stefano Fiorucci
bde92fda67
upgrade transformers and reorganize extras ( #7815 )
2024-06-06 15:57:18 +02:00
Silvano Cerza
3c8569e12c
fix: Fix running Pipeline
with conditional branch and Component with default inputs ( #7799 )
...
* Fix running Pipeline with conditional branch and Component with default inputs
* Add release notes
* Change arg name of _init_to_run so it's clearer
* Enhance release note
2024-06-06 13:19:07 +00:00
David S. Batista
ce9b0ecb19
fix: EvaluationRunResult.score_report()
is missing the metrics
column ( #7817 )
...
* fixing the DataFrame with the aggregated scores
* fixing tests
2024-06-06 14:33:45 +02:00
Silvano Cerza
23011c215e
chore: Change trafilatura dependency to use lazy import ( #7809 )
...
* Change trafilatura dependency to use lazy import
* Add release notes
2024-06-05 18:04:24 +02:00
Sebastian Husch Lee
d815c78198
feat: Add TransformersTextRouter
component ( #7801 )
...
* First pass at adding TransformerTextRouter
* Fix tests
* Add release notes
* Add optional labels param
* Add verification in the warm_up
* Fix tests
* Add labels to to_dict
* Feedback from review
* Add component to docs
* Added extra tests
2024-06-05 15:28:53 +02:00
Sebastian Husch Lee
e6b8b7529b
Update doc string of pypdf.py ( #7805 )
2024-06-05 10:45:34 +00:00
David S. Batista
19c1e2e61d
docs: fixing ContextRelevance documentation ( #7802 )
...
* fixing ContextRelevance documentation
2024-06-04 18:02:09 +02:00
Vladimir Blagojevic
678f193f10
feat: Add filter_policy init parameter to in memory retrievers ( #7795 )
...
* Add filter_policy init parameter to in-memory retrievers
2024-06-04 17:51:16 +02:00
Silvano Cerza
fd838fc573
Update indexing and rag default templates to use InMemoryDocumentStore ( #7782 )
2024-06-04 12:57:33 +02:00
Stefano Fiorucci
55a657ba81
export ChatPromptBuilder and add it to pydoc config ( #7796 )
2024-06-04 10:17:23 +02:00
Silvano Cerza
26b263e349
Fix InMemoryDocumentStore not sharing some document stats with other instances ( #7792 )
2024-06-04 10:15:50 +02:00
Silvano Cerza
74df8ed937
test: Rework Pipeline.run()
tests to ease declaration with dataclasses ( #7790 )
...
* Rework boilerplate function that run Pipeline in scenarios testing
* Update tests to use new dataclasses
* Update README.md to reflect dataclass changes
* Use absolute import from conftest
2024-06-03 15:59:42 +02:00
Daria Fokina
67abe5576b
add examples to preprocessors ( #7780 )
2024-06-03 15:42:21 +02:00
Silvano Cerza
07ae45e0c2
test: Migrate Pipeline.run()
tests with run arguments ( #7777 )
...
* Support Pipeline.run() arguments in tests
* Move intermediate outputs
2024-06-03 12:36:04 +02:00
Silvano Cerza
854c4173f2
feat: Add memory sharing between different instances of InMemoryDocumentStore
( #7781 )
...
* Add memory sharing between different instances of InMemoryDocumentStore
* Fix FilterRetriever tests
* Fix InMemoryBM25Retriever tests
2024-05-31 16:44:14 +02:00
Silvano Cerza
d81af81fbb
test: Migrate pipeline run tests ( #7775 )
...
* Move complex pipeline
* Move pipeline with default
* Move pipeline with distinct loops
* Move pipeline with double loop
* Move pipeline with dynamic inputs
* Move fixed decision pipeline
* Move fixed merging pipeline
* Move fixed decision and merge pipeline
* Remove test_joiners.py
* Move looping and merge pipeline
* Remove test_looping.py
* Move mutable input pipeline
* Move parallel branches pipeline
* Move same input different components pipeline
* Move test_run_with_greedy_variadic_after_component_with_default_input_simple
* Remove test_run_raises_if_max_visits_reached
* Move test_run_with_component_that_does_not_return_dict
* Move test_correct_execution_order_of_components_with_only_defaults
* Move test_pipeline_is_not_stuck_with_components_with_only_defaults
* Move test_pipeline_is_not_stuck_with_components_with_only_defaults_as_first_components
* Move self loop pipeline
* Move variable decision and merge pipeline
* Remove test_variable_decision_pipeline
* Move variable merging pipeline
* Add FakeComponent removed by mistake
2024-05-31 13:00:29 +02:00
Massimiliano Pippi
aa767ae142
ignore rc0 ( #7776 )
2024-05-31 12:32:08 +02:00
Silvano Cerza
a9f989d756
test: Support multiple runs for Pipeline run tests ( #7762 )
...
* Support multiple runs for Pipeline run tests
* Apply suggestions from code review
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-31 11:58:49 +02:00
Julian Risch
6723dc3801
check for RuntimeError instead of ComponentError in test ( #7769 )
2024-05-31 08:42:40 +02:00
Massimiliano Pippi
8e3a8999de
fix release workflow
2024-05-30 18:47:42 +02:00
Haystack Bot
6425b05e50
Update unstable version to 2.3.0-rc0 ( #7774 )
...
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
v2.3.0-rc0
2024-05-30 18:09:29 +02:00
Massimiliano Pippi
131e3498cd
fix release workflow
2024-05-30 18:08:06 +02:00
Massimiliano Pippi
c96741796a
fix release workflow
2024-05-30 18:06:15 +02:00
Daria Fokina
f8646e1186
update version when the components are to be removed ( #7773 )
2024-05-30 15:35:20 +00:00
Massimiliano Pippi
8d80ff86d9
Add BranchJoiner and deprecate Multiplexer ( #7765 )
2024-05-30 15:34:52 +02:00
Silvano Cerza
5c468feecf
test: Update Pipeline.run()
tests README.md
( #7757 )
...
* Update Pipeline.run() tests README.md
* Add suggestion from review
* Fix typos
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-29 14:28:42 +02:00
Massimiliano Pippi
0ceeb733ba
chore: make warm_up()
usage consistent ( #7752 )
...
* make usage consistent
* fix error type
* release notes
* pylint fix
* change of plan
* revert
* fix test
* revert
* fix HF tests
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* fix formatting
* reformat
* fix regex match with the new error message
* fix integration test
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-05-29 10:54:21 +02:00
Silvano Cerza
15aa4217bd
Install hatch in testing jobs ( #7755 )
2024-05-28 17:04:21 +02:00
Massimiliano Pippi
cc521f42ef
ci: remove dependency cache job ( #7754 )
...
* remove dependency cache job
* leftover
2024-05-28 16:03:59 +02:00
Silvano Cerza
3dcc21fd73
test: Pipeline run tests rework ( #7748 )
...
* Rework Pipeline.run() tests
* Remove test_linear_pipeline.py
* Add test for components execution order
* Add new pytest-bdd tests dependency
* Update README.md
* Add function to dinamically add integration marker
* Fix marking tests as integration
2024-05-28 15:42:47 +02:00
Luke Bentley-Fox
9fe7eff42c
fix: use correct output annotation for pdfminer converter ( #7750 )
2024-05-27 21:04:40 +02:00
Alessio Cesaretti
d0da31a047
feat: Add split_threshold to DocumentSplitter to avoid excessively short splits ( #7721 )
...
* feat: add split_threshold to document splitter to avoid excessively small splits
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* extend release note
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-05-27 14:48:38 +02:00
Silvano Cerza
22289f590f
Move tests from test_connect.py in test_pipeline.py and test_utils.py ( #7742 )
2024-05-24 16:41:38 +02:00
Silvano Cerza
f5becf2ac0
Fix NamedEntityExtractor crashing in Python 3.12 if constructed using a string backend argument. ( #7743 )
2024-05-24 16:41:29 +02:00
tstadel
98fd270428
feat: add ChatPromptBuilder, deprecate DynamicChatPromptBuilder ( #7663 )
2024-05-23 19:04:55 +02:00
Silvano Cerza
4bc62854a9
test: Fix telemetry tests so they don't fail ( #7708 )
...
* Fix telemetry tests so they don't fail
* Remove test
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-23 18:02:25 +02:00
David S. Batista
38747ff7a3
fix: failsafe for non-valid json and failed LLM calls ( #7723 )
...
* wip
* initial import
* adding tests
* adding params
* adding safeguards for nan in evaluators
* adding docstrings
* fixing tests
* removing unused imports
* adding tests to context and faithfullness evaluators
* fixing docstrings
* nit
* removing unused imports
* adding release notes
* attending PR comments
* fixing tests
* fixing tests
* adding types
* removing unused imports
* Update haystack/components/evaluators/context_relevance.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Update haystack/components/evaluators/faithfulness.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* attending PR comments
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-05-23 15:41:29 +00:00
Massimiliano Pippi
e3dccf4406
add timeout to AzureOpenAIGenerator ( #7724 )
...
* add timeout to AzureOpenAIGenerator
* add to chat also
* Update azure-openai-generator-timeout-c39ecd6d4b0cdb4b.yaml
2024-05-23 16:28:24 +02:00