Silvano Cerza
07ae45e0c2
test: Migrate Pipeline.run() tests with run arguments ( #7777 )
...
* Support Pipeline.run() arguments in tests
* Move intermediate outputs
2024-06-03 12:36:04 +02:00
Silvano Cerza
854c4173f2
feat: Add memory sharing between different instances of InMemoryDocumentStore ( #7781 )
...
* Add memory sharing between different instances of InMemoryDocumentStore
* Fix FilterRetriever tests
* Fix InMemoryBM25Retriever tests
2024-05-31 16:44:14 +02:00
Silvano Cerza
d81af81fbb
test: Migrate pipeline run tests ( #7775 )
...
* Move complex pipeline
* Move pipeline with default
* Move pipeline with distinct loops
* Move pipeline with double loop
* Move pipeline with dynamic inputs
* Move fixed decision pipeline
* Move fixed merging pipeline
* Move fixed decision and merge pipeline
* Remove test_joiners.py
* Move looping and merge pipeline
* Remove test_looping.py
* Move mutable input pipeline
* Move parallel branches pipeline
* Move same input different components pipeline
* Move test_run_with_greedy_variadic_after_component_with_default_input_simple
* Remove test_run_raises_if_max_visits_reached
* Move test_run_with_component_that_does_not_return_dict
* Move test_correct_execution_order_of_components_with_only_defaults
* Move test_pipeline_is_not_stuck_with_components_with_only_defaults
* Move test_pipeline_is_not_stuck_with_components_with_only_defaults_as_first_components
* Move self loop pipeline
* Move variable decision and merge pipeline
* Remove test_variable_decision_pipeline
* Move variable merging pipeline
* Add FakeComponent removed by mistake
2024-05-31 13:00:29 +02:00
Silvano Cerza
a9f989d756
test: Support multiple runs for Pipeline run tests ( #7762 )
...
* Support multiple runs for Pipeline run tests
* Apply suggestions from code review
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-31 11:58:49 +02:00
Massimiliano Pippi
8d80ff86d9
Add BranchJoiner and deprecate Multiplexer ( #7765 )
2024-05-30 15:34:52 +02:00
Silvano Cerza
5c468feecf
test: Update Pipeline.run() tests README.md ( #7757 )
...
* Update Pipeline.run() tests README.md
* Add suggestion from review
* Fix typos
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-29 14:28:42 +02:00
Massimiliano Pippi
0ceeb733ba
chore: make warm_up() usage consistent ( #7752 )
...
* make usage consistent
* fix error type
* release notes
* pylint fix
* change of plan
* revert
* fix test
* revert
* fix HF tests
* Apply suggestions from code review
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* fix formatting
* reformat
* fix regex match with the new error message
* fix integration test
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-05-29 10:54:21 +02:00
Silvano Cerza
3dcc21fd73
test: Pipeline run tests rework ( #7748 )
...
* Rework Pipeline.run() tests
* Remove test_linear_pipeline.py
* Add test for components execution order
* Add new pytest-bdd tests dependency
* Update README.md
* Add function to dinamically add integration marker
* Fix marking tests as integration
2024-05-28 15:42:47 +02:00
Alessio Cesaretti
d0da31a047
feat: Add split_threshold to DocumentSplitter to avoid excessively short splits ( #7721 )
...
* feat: add split_threshold to document splitter to avoid excessively small splits
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* extend release note
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
2024-05-27 14:48:38 +02:00
Silvano Cerza
22289f590f
Move tests from test_connect.py in test_pipeline.py and test_utils.py ( #7742 )
2024-05-24 16:41:38 +02:00
tstadel
98fd270428
feat: add ChatPromptBuilder, deprecate DynamicChatPromptBuilder ( #7663 )
2024-05-23 19:04:55 +02:00
Silvano Cerza
4bc62854a9
test: Fix telemetry tests so they don't fail ( #7708 )
...
* Fix telemetry tests so they don't fail
* Remove test
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-23 18:02:25 +02:00
David S. Batista
38747ff7a3
fix: failsafe for non-valid json and failed LLM calls ( #7723 )
...
* wip
* initial import
* adding tests
* adding params
* adding safeguards for nan in evaluators
* adding docstrings
* fixing tests
* removing unused imports
* adding tests to context and faithfullness evaluators
* fixing docstrings
* nit
* removing unused imports
* adding release notes
* attending PR comments
* fixing tests
* fixing tests
* adding types
* removing unused imports
* Update haystack/components/evaluators/context_relevance.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Update haystack/components/evaluators/faithfulness.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* attending PR comments
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-05-23 15:41:29 +00:00
Massimiliano Pippi
e3dccf4406
add timeout to AzureOpenAIGenerator ( #7724 )
...
* add timeout to AzureOpenAIGenerator
* add to chat also
* Update azure-openai-generator-timeout-c39ecd6d4b0cdb4b.yaml
2024-05-23 16:28:24 +02:00
tstadel
83d3970405
feat: extend PromptBuilder and deprecate DynamicPromptBuilder ( #7655 )
...
* feat: add default template to DynamicPromptBuilder
* fix mypy
* fix mypy
* extend PromptBuilder and deprecate DynamicPromptBuilder
* make backward-compatible: optional -> required
* make backward-compatible: _template_string
* make backward-compatible: missing_required_vars error
* add test for no template case
* better docstrings
* some chors
* some chors
* add reno
* revert test_dynamic_prompt_builder.py
* better docstring
* make backward-compatible: reorder init args
* fix tests
* add raises docstring
* make default template required and rework docstrings
* docs chores
* keep to_dict in place for easier review
* remove unnecessary logger
* update docstring
2024-05-23 16:03:39 +02:00
Varun Krishnan
badb05b3ab
feat: allow DocumentJoiner to accept top_k parameter in run method ( #7709 )
...
* feat: allow DocumentJoiner to accept top_k parameter in run method
* Added release note for DocumentJoiner top_k fix
2024-05-23 16:03:26 +02:00
Massimiliano Pippi
482f60ec99
fix: exit early if the component receives no documents ( #7732 )
...
* exit early if the component receives no documents
* relnote
2024-05-23 09:35:10 +02:00
David S. Batista
a4fc2b66e6
style: adding progress bar to llm-based evaluators ( #7726 )
...
* adding progress bar
* fixing typo
* fixing tests
* Update test_llm_evaluator.py
* fixing missing colon
* passing directly to parent
* adding docstrings
2024-05-23 09:22:14 +02:00
Massimiliano Pippi
76224fc781
make SerperDevWebSearch more robust ( #7725 )
2024-05-22 13:14:39 +02:00
Silvano Cerza
da088140ab
Group up Pipeline unit tests in a single class ( #7706 )
2024-05-21 16:12:28 +02:00
Stefano Fiorucci
7181f6b7e9
feat: change HTML conversion backend from boilerpy3 to Trafilatura ( #7705 )
...
* change HTML conversion backed to Trafilatura
* rm unused var
2024-05-17 10:38:47 +02:00
Carlos Fernández
57af95d7ea
add keep-id to DocumentCleaner ( #7703 )
2024-05-16 19:18:48 +02:00
Carlos Fernández
686a4999cf
feat: widen support of env vars in OpenAI components ( #7653 )
...
* add enviroment variables to the _enviroment.py file
* add support for two of the three variables
* Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder.
* Replicate support for env vars in OpenAITextEmbedder.
* Add support for env vars in OpenAIGenerator..
* Add support for env vars in OpenAIChatGenerator.
* add docstrings and reno
* add params to __init__ in OpenAIDocumentEmbedder
* add params to __init__ in OpenAITextEmbedder
* make fully functional implementation of env vars and unit tests
* update reno
* Update haystack/components/embedders/openai_text_embedder.py
* reverse changes to telemetry/_enviroment.py
* Update haystack/components/embedders/openai_text_embedder.py
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
2024-05-15 21:58:41 +00:00
Vladimir Blagojevic
c8d53b3ebf
fix: Adjust serialization to handle PEP-585 generic types ( #7690 )
...
* Adjust serialization to handle PEP-585 generic types
* Add reno note
* Simplify
* PEP 585 serialization handling in sys.version_info < (3, 9)
2024-05-15 14:25:19 +02:00
David S. Batista
96b9d3e32a
fix: Adding missing component decorator to AzureOpenAIGenerator ( #7698 )
...
* initial import
* adding release notes
* tests avoiding I/O operations
* Update fix-azure-generators-serialization-18fcdc9cbcb3732e.yaml
2024-05-15 10:00:38 +02:00
Massimiliano Pippi
cc1d4b1c80
chore: Simplify Pipeline.run method by moving code to the base class ( #7680 )
...
* move graph initialization to the base class
* simplify data normalization
* deepcopy data in base class
* initialize inputs state
* move to_run preparation to the base class
* Test Pipeline._init_to_run()
* Test Pipeline._init_inputs_state()
* Test Pipeline._prepare_component_input_data()
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
2024-05-14 23:25:46 +02:00
David S. Batista
798dc4a4a5
fix: avoid FaithfulnessEvaluator and ContextRelevanceEvaluator return Nan ( #7685 )
...
* initial import
* fixing tests
* relaxing condition
* adding safeguard for ContextRelevanceEvaluator as well
* adding release notes
2024-05-14 17:08:51 +02:00
Madeesh Kannan
2428bc2a92
fix: Pipeline.run correctly returns all outputs when the include_outputs_from parameter is used ( #7697 )
...
* fix: `Pipeline.run` correctly returns all outputs when the `include_outputs_from` parameter is used
* Add release note
2024-05-14 12:29:41 +02:00
Vladimir Blagojevic
4352b1688e
fix: Fix NamedEntityExtractor serde ( #7684 )
...
* Fix NamedEntityExtractor serde
* Add release note
* Linting, remove unit markers
2024-05-14 12:24:55 +02:00
Sebastian Husch Lee
a2be90b95a
fix: Update device deserialization for components that use local models ( #7686 )
...
* fix: Update device deserializtion for SentenceTransformersTextEmbedder
* Add unit test
* Fix unit test
* Make same change to doc embedder
* Add release notes
* Add same change to Diversity Ranker and Named Entity Extractor
* Add unit test
* Add the same for whisper local
* Update release notes
2024-05-14 08:36:14 +02:00
Vladimir Blagojevic
811b93db91
feat: Set ByteStream's mime_type attribute for web based resources ( #7681 )
2024-05-13 19:44:02 +02:00
Massimiliano Pippi
1d20ac3c5e
chore: extract BasePipeline ( #7673 )
...
* extract BasePipeline
* release note
* add missing headers
* move __eq__ to the base class
* proper check type equality, bless the tests
2024-05-10 11:35:15 +02:00
Massimiliano Pippi
10c675d534
chore: add license header to all modules ( #7675 )
...
* add license header to modules
* check license header at linting time
2024-05-09 13:40:36 +00:00
Stefano Fiorucci
7c9532b200
fix broken serialization of HFAPI components ( #7661 )
2024-05-08 17:14:37 +02:00
Stefano Fiorucci
94467149c1
fix: fix serialization of DocumentRecallEvaluator ( #7662 )
...
* fix serialization of DocumentRecallEvaluator
* add requested tests
2024-05-08 16:00:49 +02:00
Guest400123064
cd66a80ba2
perf: enhanced InMemoryDocumentStore BM25 query efficiency with incremental indexing ( #7549 )
...
* incorporating better bm25 impl without breaking interface
* all three bm25 algos
* 1. setting algo post-init not allowed; 2. remove extra underscore for naming consistency; 3. remove unused import
* 1. rename attribute name for IDF computation 2. organize document statistics as a dataclass instead of tuple to improve readability
* fix score type initialization (int -> float) to pass mypy check
* release note included
* fixing linting issues and mypy
* fixing tests
* removing heapq import and cleaning up logging
* changing indexing order
* adding more tests
* increasing tests
* removing rank_bm25 from pyproject.toml
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-05-03 12:10:15 +00:00
Vladimir Blagojevic
5f813373eb
chore: Update huggingface_hub classes used after library upgrade ( #7631 )
...
* Update huggingface_hub classes used after library upgrade
* Fix chat tests
* Update lazy import guard and other references to huggingface_hub>=0.23.0
* In huggingface_hub 0.23.0 TextGenerationOutput property details is now optional
* More fixes
* Add reno note
2024-05-03 10:14:54 +02:00
Julian Risch
b0284977db
feat: Add document page number of ExtractedAnswer to meta ( #7572 )
...
* calculate page number of answer and add to meta
* fix mypy, add reno
* add test
* simplify unit test
* update release note
* undo @patch updates
* extend tests, check page_number type
2024-05-02 14:48:27 +02:00
Mo
2e35f13085
feat: add converter based on pdfminer ( #7607 )
...
* Initial commit pdfminer converter
* Revert back naming of argument all_text per pdfminer documentation
* Add the component decorator
* Add release notes
* Reformat code with black
* Remove LTPage and comments
* Update dependencies in pyproject.toml
* Added some tests and incorporated reference doc in docstring
* Added some tests and incorporated reference doc in docstring
2024-05-02 10:36:54 +02:00
Julian Risch
2509eeea7e
refactor: Rename FaithfulnessEvaluator input responses to predicted_answers ( #7621 )
2024-04-30 16:30:57 +02:00
Vladimir Blagojevic
8cb3cecf34
feat: Trace pipeline run input/output data ( #7590 )
...
* Trace pipeline run
* Add reno note
* Update tracing tests to check input_data and output_data
* empty
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
2024-04-29 17:29:27 +02:00
Bohan Qu
40360e44ff
feat: add required flag for prompt builder inputs ( #7553 )
2024-04-29 14:21:53 +02:00
Carlos Fernández
d2c87b2fd9
feat: add page_number to metadata in DocumentSplitter ( #7599 )
...
* Add the implementation for page counting used in the v1.25.x branch. It should work as expected in issue #6705 .
* Add tests that reflect the desired behabiour. This behabiour is inffered from the one it had on Haystack 1.x
Solve some minor bugs spotted by tests.
* Update docstrings.
* Add reno.
* Update haystack/components/preprocessors/document_splitter.py
Update docstring from suggestion
Co-authored-by: David S. Batista <dsbatista@gmail.com>
* solve suggestion to improve readability
* fragment tests
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: David S. Batista <dsbatista@gmail.com>
* Update .gitignore
* Update .gitignore
* Update add-page-number-to-document-splitter-162e9dc7443575f0.yaml
* blackening
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
2024-04-29 12:51:18 +02:00
Madeesh Kannan
a881451d3a
refactor: Refactor EvaluationResult into BaseEvaluationRunResult and EvaluationRunResult ( #7594 )
...
The new `EvaluationRunResult` has slightly different semantics - it separates the previous `data` parameter into `inputs` and `results`and expects aggregate scores to be provided in the latter.
2024-04-25 12:16:48 +02:00
Madeesh Kannan
ec0e22265a
feat: Expand Pipeline.inputs and Pipeline.outputs to include connected sockets ( #7586 )
2024-04-24 12:27:18 +02:00
Stefano Fiorucci
19a46af9da
add __eq__ method to SparseEmbedding ( #7574 )
...
* add __eq__ method to SparseEmbedding
* reno
* improve reno
2024-04-23 19:03:41 +02:00
Julian Risch
9c56dbe288
test: Make ContextRelevanceEvaluator integration test more robust ( #7584 )
2024-04-23 16:01:25 +00:00
Julian Risch
07307709ee
test: Make FaithfulnessEvaluator integration test more robust ( #7582 )
2024-04-23 15:44:00 +00:00
Stefano Fiorucci
081757c6b9
test: replace mistral-7b with zephyr-7b-beta in tests ( #7576 )
...
* replace mistral-7b with gemma-2b-it in tests
* rm wrong comment
* change model
2024-04-23 13:56:07 +02:00
Julian Risch
d7638cfd4b
refactor: FaithfulnessEvaluator specifies inputs explicitly ( #7548 )
...
* specify inputs explicitly. move out examples
* Update haystack/components/evaluators/faithfulness.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
2024-04-22 12:52:10 +00:00