* Adding filters param to MostSimilarDocumentsPipeline run and run_batch
* Adding index param to MostSimilarDocumentsPipeline run and run_batch
* Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch
* Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch
* Adding filters param to MostSimilarDocumentsPipeline run and run_batch
* Adding index param to MostSimilarDocumentsPipeline run and run_batch
* Adding index param documentation to MostSimilarDocumentsPipeline run and run_batch
* Updated index param documentation to MostSimilarDocumentsPipeline run and run_batch. Updated type: ignore in run_batch
* don't send the list of inputs back as an output in the running of a node.
* updated documentation
* Update pydoc-markdown.py
* added test case for pipeline join fix
Co-authored-by: JeffRisberg <jrisberg@aol.com>
* fix milvus and faiss tests not running
* fix schema manually
* fix test_dpr_embedding test for milvus
* pip freeze on milvus tests
* fix milvus1 tests being executed: fix all_doc_stores order
* Revert "pip freeze on milvus tests"
This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.
* make infer_required_doc_store more robust
* don't skip tests without docstore requirements
* use markers for docstore tests
* quick fix benchmark runs to make them work with current haystack version
* fix minor typo
* update readme. fix minor things to make benchmarks run again
* Update Documentation & Code Style
* fix typo in readme
* update result files for reader and retriever querying
* reduce batch size for update embeddings to prevent xlarge bulk_update requests that exceed elastic's limits (happening in dense 500k runs)
* change default memory allocation back to normal. add note to readme
* add first indexing results
* add memory to docker cmd
* full benchmarks results on commit c5a2651fcbbeffca06ffa9036b10e62669bcc1b0
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Use the %s syntax on all debug messages
* Use the %s syntax on some more debug messages
* Use the %s syntax on info messages
* Use the %s syntax on warning messages
* Use the %s syntax on error and exception messages
* mypy
* pylint
* trogger tutorials execution in CI
* trigger tutorials execution on CI
* black
* remove embeddings from repr
* fix Document `__repr__`
* address feedback
* mypy
* feat(PDFToTextConverter): add option to get text in physical layout order
* test: add physical layout extraction test to PDFToTextConverter
* refactor: change layout parameter attribution places
* docs: manually trigger pre-commits
* docs: generate new docs to comply with pydoc-markdown style
* refactor: improve support for dataclasses
* refactor: refactor class init
* refactor: remove unused import
* refactor: testing 3.7 diffs
* refactor: checking meta where is Optional
* refactor: reverting some changes on 3.7
* refactor: remove unused imports
* build: manual pre-commit run
* doc: run doc pre-commit manually
* refactor: post initialization hack for 3.7-3.10 compat.
TODO: investigate another method to improve 3.7 compatibility.
* doc: force pre-commit
* refactor: refactored for both Python 3.7 and 3.9
* docs: manually run pre-commit hooks
* docs: run api docs manually
* docs: fix wrong comment
* refactor: change no type-checked test code
* docs: update primitives
* docs: api documentation
* docs: api documentation
* refactor: minor test refactoring
* refactor: remova unused enumeration on test
* refactor: remove unneeded dir in gitignore
* refactor: exclude all private fields and change meta def
* refactor: add pydantic comment
* refactor : fix for mypy on Python 3.7
* refactor: revert custom init
* docs: update docs to new pydoc-markdown style
* Update test/nodes/test_generator.py
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* not working draft
* first draft
* fix
* revert json schema
* better schema
* improvements, support different python versions
* little simplification
* improvements and more tests
* Revert "Merge branch 'handle_optional_params' into origin/main"
This reverts commit 0114cba1f72c9bab23a3ce6a24cb4b346834cf34.
* fix git mess
* handle optional params; schema
* test null values
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Set translated text on a copy of original document
* Return new translated list
* Manually generated docs
TODO: check pre-commit
* Hook generated file
* Rename variables for better maintenance
* fix(translator): prevent inputs from being changed
* fix: manual update translator docs
* style(translator): explicit type declaration on List
* docs(translator): re-run pre-commit hook
* style(translator): ignore mypy wrong type check
* docs(translator): re-run pre-commit hook
* Raise error upon duplicate document key found within meta info
* value error msg fix
* Update Documentation & Code Style
* Raise exception instead of asserting
* Update Documentation & Code Style
* add test
* added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore
* Add documentation on metadata filtering in docstring
* docs
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Enable the `JoinDocuments` node to work with documents with `score=None`
This fixes#2983
As of now, the `JoinDocuments` node will error out if any of the documents has `score=None` - which is possible, as some retriever are not able to provide a score, like the `TfidfRetriever` on Elasticsearch or the `BM25Retriever` on Weaviate.
THe reason for the error is that the `JoinDocuments` always sorts the documents by score and cannot sort when `score=None`.
There was a very similar issue for `JoinAnswers` too, which was addressed by this PR: https://github.com/deepset-ai/haystack/pull/2436
This solution applies the same solution to `JoinDocuments` - so both the `JoinAnswers` and `JoinDocuments` now will have the same additional argument to disable sorting when that is requried.
The solution is to add an argument to `JoinDocuments` called `sort_by_score: bool`, which allows the user to turn off the sorting of documents by score, but keeps the current functionality of sorting being performed as the default.
* Fixing test bug
* Addressing PR review comments
- Extending unit tests
- Simplifying logic
* Making the sorting work even with no scores
By making the no score being sorted as -Inf
* Forgot to commit the change in `join_docs.py`
* [EMPTY] Re-trigger CI
* Added am INFO log if the `JoinDocuments` is sorting while some of the docs have `score=None`
* Adjusting the arguments of `any()`
* [EMPTY] Re-trigger CI
* Refactoring the `Raypipeline.run` method - merging it with the `Pipeline.run`
This is to fix#2968
* Bug: variable `i` was already in use
* Removing unused imports
* Removing unused import
* [EMPTY] Re-trigger CI
* Addressing concerns raised pre-review
- Removing the attempt to try to make it without the need for `JoinDocuments` - it is okey to fail without `JoinDocuments` for certain pipelines.
* Refactoring based on reviews
* Adding support for additional distance metrics for Weaviate
Fixes#3000
* Updating the docs
* Fixing error texts
* Fixing issues raised by the review
* Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity`
* [EMPTY] Re-trigger CI
* Fixing things based on review
* [EMPTY] Re-trigger CI
* feat: fetch results for DeepsetCloudExperiments
* chore: test DC fetch predicitons for eval run
* chore: switch to dict iteration with .items()
* chore: update DC url to fetch predictions from
* chore: update doc strings for fetching eval run results
* chore: update DeepsetCloudExperiments description, change function names for fetching predictions of an eval run
* chore: test for DeepsetCloudExperiments.get_run_results
* chore: adjust request mock for test_get_eval_run_results
* chore: push first row of dataframe into variable for test checks
* chore: adjust mock data to correct data types
* chore: make documentation more readable with line breaks
* chore: update documentation for eval run result fetching
* use hashlib.md5() instead of (interpreter dependent) hash() funtion to generate MultiLabel id
* add tests to assess constancy of MultiLabel.id
* make test_multilabel_id test ensure that MultiLabel ids are always the same
* Add page number to Documents coming from PDFConverters and PreProcessor
* Fix mypy
* Update API Docs
* Update API Docs
* Remove unused imports
* Generate JSON schema
* Generate JSON schema
* Make test variable shorter
* Make regex a separate function
* Move counting of page breaks to a function
* Generate JSON schema
* Apply suggestions from code review
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Update API Documentation
* Don't create instance for testing staticmethod
* Update haystack/nodes/preprocessor/preprocessor.py
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* fix validation for dynamic outgoing edges
* Update Documentation & Code Style
* use class outgoing_edges as fallback if no instance is provided
* implement classmethod approach
* readd comment
* fix mypy
* fix tests
* set outgoing_edges for all components
* set outgoing_edges for mocks too
* set document store outgoing_edges to 1
* set last missing outgoing_edges
* enforce BaseComponent subclasses to define outgoing_edges
* override _calculate_outgoing_edges for FileTypeClassifier
* remove superfluous test
* set rest_api's custom component's outgoing_edges
* Update docstring
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* remove unnecessary else
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* enable Opensearch unit tests under Win
* move unit tests into a dedicated job
* skip audio tests on missing dependencies
* avoid failing test collection when soundfile is not available
* Update .github/workflows/tests.yml
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Ability to run Ray Serve detached
Fixes#2944
Ability to run Ray Serve detached - to allow running multiple instances of the app (HA).
See https://docs.ray.io/en/latest/serve/package-ref.html#core-apis
* Generating the docs
* Re-trigger the CI pipeline
* Retrigger the CI Pipeline
* Typo in docstrings
* Fixing docstring and typing issues
* Regenerating docs
* [EMPTY] Re-trigger CI
* [EMPTY] Re-trigger CI
* Refactoring to allow any number of args for the `serve.start()` method
There seems to be additional arguments of the `serve.start()` method, so we should probably cover all of them at once, instead of only the `detached` option.
* [EMPTY] Re-trigger CI
* Test whether the ServeControllerClient in fact has the supplied `detached` parameter