* Refactor table reader to use util functions to reduce code duplication.
* Expanding the tests for the table reader
* Adding types
* Updating tests to work for RCIReader
* Fix bug in RCIReader. Saving the wrong queries list.
* Update _flatten_inputs to not change input variable
* Remove duplicate code
* Fixing broken BM25 support with Weaviate - fixes#3720
Unfortunately the BM25 support with Weaviate got broken with Haystack v1.11.0+, which is getting fixed with this commit.
Please see more under issue #3720.
* Fixing mypy issue - method signature wasn't matching the base class
* Mypy related test fix
Mypy forced me to set the signature of the `query` method of the Weaviate document store to the same as its parent, the `KeywordDocumentStore`, where the `query` parame is `Optional`, but has NO default value, so it must be provided (as None) at runtime.
I am not quite sure why the abstract method's `query` param was set without a default value while its type is `Optional`, but I didn't want to change that, so instead I have changed the Weaviate tests.
* Adding a note regarding an upcomming fix in Weaviate v1.17.0
* Apply suggestions from code review
* revert
* [EMPTY] Re-trigger CI
* first draft to add index param to tfidf
* better mypy handling
* Revert "better mypy handling"
This reverts commit 91a22516320f9dcbeae53827ec69f9dc51e1785c.
* new check in auto_fit
* new check also in retrieve
* better dict typings
* new test and improvements to other test
* remove unnecessary lambda
* improve test
* remove newline from openapi json
* fix test
* language fix
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 2
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 3
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 4
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 5
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* language fix 6
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* explicit index value handling
* fix test
* better error messages
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Fixing the `query_batch` method of the deepsetcloud document store - fixes#3722
* Trigger Build
* Trigger Build
* Trigger CI
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
* first try and new test
* fix test
* fix unused import
* remove comments
* no more dataclass
* add __eq__ and extend test
* better design from review
* Update schema.py
* fix black
* fix openapi
* fix openapi 2
* new try to fix openapi
* remove newline from openapi json
* fix for multilevel metadata dictionaries
* add metadata dict formating to update function
* typing
* added check for labels meta
* added more info to input parameters
* added test for multilayer metadata
* removed todo
* Fix docstrings for DocumentStores
* Fix docstrings for AnswerGenerator
* Fix docstrings for Connector
* Fix docstrings for DocumentClassifier
* Fix docstrings for LabelGenerator
* Fix docstrings for QueryClassifier
* Fix docstrings for Ranker
* Fix docstrings for Retriever and Summarizer
* Fix docstrings for Translator
* Fix docstrings for Pipelines
* Fix docstrings for Primitives
* Fix Python code block spacing
* Add line break before code block
* Fix code blocks
* fix: discard metadata fields if not set in Weaviate (#3578)
* fix weaviate bug in returning embeddings and setting empty meta fields
* review comment
* Update unstable version and openapi schema (#3584)
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* fix: Flatten `DocumentClassifier` output in `SQLDocumentStore`; remove `_sql_session_rollback` hack in tests (#3273)
* first draft
* fix
* fix
* move test to test_sql
* test: add test to check id_hash_keys is not ignored (#3577)
* refactor: Generate JSON schema when missing (#3533)
* removed unused script
* print info logs when generating openapi schema
* create json schema only when needed
* fix tests
* Remove leftover
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
* move milvus tests to their own module (#3596)
* feat: store metadata using JSON in SQLDocumentStore (#3547)
* add warnings
* make the field cachable
* review comment
* Pin faiss-cpu as 1.7.3 seems to have problems (#3603)
* Update Haystack imports (#3599)
* Update Python version (#3602)
* fix: `ParsrConverter` fails on pages without text (#3605)
* try to fix bug
* remove print
* leftover
* refactor: update Squad data (#3513)
* refractor the to_squad data class
* fix the validation label
* refractor the to_squad data class
* fix the validation label
* add the test for the to_label object function
* fix the tests for to_label_objects
* move all the test related to squad data to one file
* remove unused imports
* revert tiny_augmented.json
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
* Url fixes (#3592)
* add 2 example scripts
* fixing faq script
* fixing some urls
* removing example scripts
* black reformatting
* add labeler to the repo (#3609)
* convert eval metrics to python float (#3612)
* feat: add support for `BM25Retriever` in `InMemoryDocumentStore` (#3561)
* very first draft
* implement query and query_batch
* add more bm25 parameters
* add rank_bm25 dependency
* fix mypy
* remove tokenizer callable parameter
* remove unused import
* only json serializable attributes
* try to fix: pylint too-many-public-methods / R0904
* bm25 attribute always present
* convert errors into warnings to make the tutorial 1 work
* add docstrings; tests
* try to make tests run
* better docstrings; revert not running tests
* some suggestions from review
* rename elasticsearch retriever as bm25 in tests; try to test memory_bm25
* exclude tests with filters
* change elasticsearch to bm25 retriever in test_summarizer
* add tests
* try to improve tests
* better type hint
* adapt test_table_text_retriever_embedding
* handle non-textual docs
* query only textual documents
* Incorporate Reviewer feedback
* refactor: replace `torch.no_grad` with `torch.inference_mode` (where possible) (#3601)
* try to replace torch.no_grad
* revert erroneous change
* revert other module breaking
* revert training/base
* Fix docstrings for DocumentStores
* Fix docstrings for AnswerGenerator
* Fix docstrings for Connector
* Fix docstrings for DocumentClassifier
* Fix docstrings for LabelGenerator
* Fix docstrings for QueryClassifier
* Fix docstrings for Ranker
* Fix docstrings for Retriever and Summarizer
* Fix docstrings for Translator
* Fix docstrings for Pipelines
* Fix docstrings for Primitives
* Fix Python code block spacing
* Add line break before code block
* Fix code blocks
* Incorporate Reviewer feedback
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
Co-authored-by: Espoir Murhabazi <espoir.mur@gmail.com>
Co-authored-by: Tuana Celik <tuana.celik@deepset.ai>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>