* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* update readme and contributing.md
* update contributing
* adjust example
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add inferencer for QA only
* Add latest docstring and tutorial changes
* Add QA inferencer tests
* Add type annotations for inferencer
* Fix type annotations, move util functions
* Fix type annotations
* Move fixtures to the top of the file
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Saves the FAISSDocumentStore init params to JSON at save() and loads them at load() if they're found. First draft, to be tested.
* Fixing issue with string/Path objects in a few string operations, thanks mypy
* Leverage self.set_config instead of saving the parameters in a separate attribute
* Modify test_faiss_and_milvus:test_faiss_index_save_and_load to test that init params are preserved
* Add assert to verify that the SQL doc count and FAISS vector count is equal. Needs to always specify the name of the SQL db for this to work
* Simplified the implementation a bit, add better comments
* Forgot a return at the end of the file
* Fixing some of the suggestions from the review
* Add a try-catch in the load method and fix the tests
* Typo
* feat: normalize embeddings for cosine sim
* WIP add test case for faiss cosine
* input to faiss normalize needs to be an array of vectors
* fix: test should compare correct result embedding to original embedding
* add sanity check for cosine sim
* fix typo
* normalize cosine score
* Update docstring
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Add type annotations in QuestionAnsweringHead
* Fix test by increasing max_seq_len
* Add SampleBasket type annotation
* Remove prediction head param from adaptive model init
* Add type ignore for AdaptiveModel init
* Fix and rename tests
* Adjust folder structure
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* Added support for Multi-GPU inference to DPR including benchmark
* fixed multi gpu
* added batch size to benchmark to better reflect multi gpu capabilities
* remove unnecessary entry in config.json
* fixed typos
* fixed config name
* update benchmark to use DEVICES constant
* changed multi gpu parameters and updated docstring
* adds silent fallback on cpu
* update doc string, warning and config
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* [UPDT] delete_all_documents() replaced by delete_documents()
* [UPDT] warning logs to be fixed
* [UPDT] delete_all_documents() renamed and the same method added
Co-authored-by: Ram Garg <ramgarg102@gmai.com>
* Removing probability field from reader and from test cases
* Add switch to FARMReader to choose score/probability
* Remove probability field from doc returned by doc store
* Relax assertion testing joined es and dpr predictions
* Use switch for confidence scores also for no_answer
* Add test that checks switching to old answer scores > 10
* Normalize score in elastic doc store and reset reader.md
* Scale weights of JoinDocuments to sum to 1 and adapt test case
* Add FARM classification node
* Add classification output to meta field of document
* Update usage example
* Add test case for FARMClassifier
* Replace FARMRanker with FARMClassifier in documentation strings
* Remove base method not implemented by any child class, etc.
* [pipeline] Allow for batch indexing when using Pipelines fix#1168
* [pipeline] Test case fixed fix#1168
* [file_converter] Path.suffix updated #1168
* [file_converter] meta can be one of these three cases:
A single dict that is applied to all files
One dict for each file being converted
None #1168
* [file_converter] mypy error fixed.
* [file_converter] mypy error fixed.
* [rest_api] batch file upload introduced in indexing API.
* [test_case] Test_api file upload parameter name updated.
* [ui] Streamlit file upload parameter updated.
* Annotation Tool: data is not persisted when using local version #853
* First version of weaviate
* First version of weaviate
* First version of weaviate
* Updated comments
* Updated comments
* ran query, get and write tests
* update embeddings, dynamic schema and filters implemented
* Initial set of tests and fixes
* Tests added for update_embeddings and delete documents
* introduced duplicate documents fix
* fixed mypy errors
* Added Weaviate to requirements
* Fix the weaviate docker env variables
* Fixing test dependencies for now
* Created weaviate test marker and fixed query
* Update docstring
* Add documentation
* Bump up weaviate version
* Bump up weaviate version in documentation
* Bump up weaviate version in documentation
* Updgrade weaviate version
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Make label field flexible, add docstrings
* Satisfy mypy
* Fix failing tests
* Adjust docstring
* Fix tutorial
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Adding ranker similar to retriever and reader
* Sort documents according to query-document similarity scores
* Reranking and model training runs for small example
* Added EvalRanker node
* Calculate recall@k in EvalRetriever and EvalRanker nodes
* Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers
* Added mean reciprocal rank as metric for EvalDocuments
* Fix bug that appeared when ranking documents with same score
* Remove commented code for unimplmented eval() of Ranker node
* Add documentation of k parameter in EvalDocuments
* Add Ranker docu and renaming top_k param
* [document_stores]Add the progressbar in update_embeddings() to track the overall documents progress closed#1037
* change 2nd level loop to docs. switch to tqdm.auto.
* [document_stores] Elasticsearch new method get_document_without_embedding_count() added.
* [test_case] Elasticsearch documentstore get_document_without_embedding_count() test case added.
* [document_stores] Add new bool arg in get_document_count() method and fixed#1082
* [document_stores] typo fixed#1082
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* using text hash as id to prevent document duplication. Also providing a way customize it.
* Add latest docstring and tutorial changes
* Fixing duplicate value test when text is same
* Adding test for duplicate ids in document store
* Changing exception to generic Exception type
* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Allow filtering of duplicate answers as implemented in FARM
* Changed default behavior to filtering exact duplicates
* Change expected test result due to filtering of duplicate answers by default
* Rounding expected test results for comparison with predictions