* Testing black on ui/
* Applying black on docstores
* Add latest docstring and tutorial changes
* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too
* Remove comments
* Relax constraints on pydoc-markdown
* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade
* Fix a couple of bugs
* Add a type: ignore that was missing somehow
* Give path to black
* Apply Black
* Apply Black
* Relocate a couple of type: ignore
* Update documentation
* Make Linux CI run after applying Black
* Triggering Black
* Apply Black
* Remove dependency, does not work well
* Remove manually double trailing commas
* Update documentation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* provide option to recreate es doc store on initialization
* Add latest docstring and tutorial changes
* Label expects more arguments
* Label expects also an answer
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Review changes
* Added the synonym analyser for search fields
* Added the review requests.
* Added the synonyms the OpenSearchDocumentStore and review requests.
* Fist attempt at using setup.cfg for dependency management
* Trying the new package on the CI and in Docker too
* Add composite extras_require
* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports
* Fix bug on class import and rephrase error message
* Introduce typing for optional modules and add type: ignore in sparse.py
* Include importlib_metadata backport for py3.7
* Add colab group to extra_requires
* Fix pillow version
* Fix grpcio
* Separate out the crawler as another extra
* Make paths relative in rest_api and ui
* Update the test matrix in the CI
* Add try catch statements around the optional imports too to account for direct imports
* Never mix direct deps with self-references and add ES deps to the base install
* Refactor several paths in tests to make them insensitive to the execution path
* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix
* Wrap pdf conversion utils into safe_import
* Update some tutorials and rever Milvus1 as default for now, see #2067
* Fix mypy config
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* doc store should return all documents matching ids passed to get_documents_by_id
* test for get_document_by_id should be named correctly
* add test for get_documents_by_id
* Add latest docstring and tutorial changes
* document es query limit
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* minimal DCDocumentStore
* support filters
* implement get_documents_by_id
* handle not existing documents
* add docstrings
* auth added
* add tests
* generate docs
* Add latest docstring and tutorial changes
* add responses to dev dependencies
* fix tests
* support query() and quey_by_embedding()
* Add latest docstring and tutorial changes
* query tests added
* read api_key and api_endpoint from env
* Add latest docstring and tutorial changes
* support query() and quey_by_embedding()
* query tests added
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* support dynamic similarity and return_embedding values
* Add latest docstring and tutorial changes
* adjust KeywordDocumentStore description
* refactoring
* Add latest docstring and tutorial changes
* implement get_document_count and raise on all not implemented methods
* Add latest docstring and tutorial changes
* don't use abbreviation DC in comments and errors
* Add latest docstring and tutorial changes
* docstring added to KeywordDocumentStore
* Add latest docstring and tutorial changes
* enhanced api key set
* split tests into two parts
* change setup.py in order to work around build cache
* added link
* Add latest docstring and tutorial changes
* rename DCDocumentStore to DeepsetCloudDocumentStore
* Add latest docstring and tutorial changes
* remove dc.py
* reinsert link to docs
* fix imports
* Add latest docstring and tutorial changes
* better test structure
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
* Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints
* update_document_meta() was not using index properly
* Exclude ES and Memory from the cosine_sanity_check test
* move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* align document store similarity functions
* remove unnecessary imports
* undone accidental change
* stopped weaviate from pretending to support dot product similarity
* stopped weaviate from pretending to support dot product similarity
* Add latest docstring and tutorial changes
* fix fixture params for document stores
* use cosine similarity for most tests
* fix cosine similarity test
* fix faiss test
* fix weaviate test
* fix accidental deletion
* fix document_store fixture
* test fix; shouldn't be merged
* fix test_normalize_embeddings_diff_shapes
* probably a better fix
* fix for parameter combinations
* revert new pytest_generate_tests functionality
* simplify pytest_generate_tests
* normalize embeddings for test_dpr_embedding
* add to faiss doc that embeddings are normalized
* Add latest docstring and tutorial changes
* remove unnecessary parameters and add comments
* simplify two lines of memory.py into one
* test similarity scores with smaller language model
* fix test_similarity_score
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores
* Unit test cases updated to use `embedding_dim` instead of `vector_dim`
* Unit test case update to use embedding_dim instead of vector_dim
* Add latest docstring and tutorial changes
* Put usage of `vector_dim` param in same if-block as corresponding warning
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* retriever metrics added
* Add latest docstring and tutorial changes
* answer and document level matching metrics implemented
* Add latest docstring and tutorial changes
* answer related metrics for retriever
* basic reader metrics implemented
* handle no_answers
* fix typing
* fix tests
* fix tests without sas
* first draft for simulated top k
* rename sas and f1 columns in dataframe
* refactoring of EvaluationResult
* Add latest docstring and tutorial changes
* more eval tests added
* fix sas expected value precision
* distinction between ir and qa recall
* EvaluationResult.worst_queries() implemented
* print_evaluation_report() added
* eval report for QA Pipeline improved
* dynamic metrics for worst queries calc
* Add latest docstring and tutorial changes
* method names adjusted
* simple test for print_eval_report() added
* improved documentation
* Add latest docstring and tutorial changes
* minor formatting
* Add latest docstring and tutorial changes
* fix no_answer cases
* adjust one docstring
* Add latest docstring and tutorial changes
* fix no_answer cases for sas
* batchmode for sas implemented
* fix for retriever metrics if there are only no_answers
* fix multilabel tests
* improve documentation for pipeline.eval()
* streamline multilabel aggregates and docs
* Add latest docstring and tutorial changes
* fix multilabel tests
* unify document_id
* add dataframe schema description to EvaluationResult
* Add latest docstring and tutorial changes
* rename worst_queries to wrong_examples
* Add latest docstring and tutorial changes
* make query digesting standard pipelines work with pipeline.eval()
* Add latest docstring and tutorial changes
* tests for multi retriever pipelines added
* remove unnecessary import
* print_eval_report(): support all pipelines without junctions
* Add latest docstring and tutorial changes
* fix typos
* Add latest docstring and tutorial changes
* fix minor simulated_top_k bug and use memory documentstore throughout tests
* sas model param description improved
* Add latest docstring and tutorial changes
* rename recall metrics
* Add latest docstring and tutorial changes
* fix mean average precision link
* Add latest docstring and tutorial changes
* adjust sas description docstring
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Introduced an arg add synonyms to Elasticsearch
* Added the test code, removed the whitespace formatting changes, and overwrote the relevant parts from the already existing mapping instead of creating new mapping.
* Added the test code
* Remove whitespace change
* Added the doc_string with examples and link
* Removed unneccessary spaces
* Add latest docstring and tutorial changes
* fix text_field -> content_field
Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Filtering records not having embeddings
* Added support for skip_missing_embeddings Flag. Default behavior is throw error when embeddings are missing. If skip_missing_embeddings=True then documents without embeddings are ignored for vector similarity
* Fix for below error:
haystack/document_stores/elasticsearch.py:852: error: Need type annotation for "script_score_query"
* docstring for skip_missing_embeddings parameter
* Raise exception where no documents with embeddings is found for Embedding retriever.
* Default skip_missing_embeddings to True
* Explicitly check if embeddings are present if no results are returned by EmbeddingRetriever for Elasticsearch
* Added test case for based on Julian's input
* Added test case for based on Julian's input. Fix pytest error on the testcase
* Added test case for based on Julian's input. Fix pytest error on the testcase
* Added test case for based on Julian's input. Fix pytest error on the testcase
* Simplify code by using get_embed_count
* Adjust docstring & error msg slightly
* Revert error msg
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Add support for tables in SQLDocumentStore, FAISSDocumentStore and MilvuDocumentStore
* Add support for WeaviateDocumentStore
* Make sure that embedded meta fields are strings + add embedding_dim to WeaviateDocStore in test config
* Add latest docstring and tutorial changes
* Represent tables in WeaviateDocumentStore as nested lists
* Fix mypy
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* create uuid and dummy embeddding in weaviate doc store
* handle and test for duplicate non-uuid-formatted ids in weaviate
* add uuid and dummy embedding to doc strings
* Add latest docstring and tutorial changes
* Upgrade weaviate
* Include weaviate in common doc store test cases
* Add latest docstring and tutorial changes
* Exclude weaviate doc store from eval tests
* Incorporate index name in uuid generation
* Ignore mypy error
* Fix typo
* Restore DOCS without uuid and embeddings generated by weaviate
* Supply docs for retriever tests as fixture
* Limit scope of fixture to function instead of session
* Add comments
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Feat: Removing use of temp file while downloading archive from url along with adding CI for windows and mac platform
* Windows CI by default installing pytorch gpu hence updating CI to pick cpu version
* fixing mac cache build issue
* updating windows pip install command for torch
* another attempt
* updating ci
* Adding sudo
* fixing ls failure on windows
* another attempt to fix build issue
* Saving env variable of test files
* Adding debug log
* Github action differ on windows
* adding debug
* anohter attempt
* Windows have different ways to receive env
* fixing template
* minor fx
* Adding debug
* Removing use of json
* Adding back fromJson
* addin toJson
* removing print
* anohter attempt
* disabling parallel run at least for testing
* installing docker for mac runner
* correcting docker install command
* Linux dockers are not suported in windows
* Removing mac changes
* Upgrading pytorch
* using lts pytorch
* Separating win and ubuntu
* Install java 11
* enabling linux container env
* docker cli command
* docker cli command
* start elastic service
* List all service
* correcting service name
* Attempt to fix multiple test run
* convert to json
* another attempt to check
* Updating build cache step
* attempt
* Add tika
* Separating windows CI
* Changing CI name
* Skipping test which does not work in windows
* Skipping tests for windows
* create cleanup function in conftest
* adding skipif marker on tests
* Run windows PR on only push to master
* Addressing review comments
* Enabling windows ci for this PR
* Tika init is being called when importing tika function
* handling tika import issue
* handling tika import issue in test
* Fixing import issue
* removing tika fixure
* Removing fixture from tests
* Disable windows ci on pull request
* Add back extra pytorch install step
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Files moved, imports all broken
* Fix most imports and docstrings into
* Fix the paths to the modules in the API docs
* Add latest docstring and tutorial changes
* Add a few pipelines that were lost in the inports
* Fix a bunch of mypy warnings
* Add latest docstring and tutorial changes
* Create a file_classifier module
* Add docs for file_classifier
* Fixed most circular imports, now the REST API can start
* Add latest docstring and tutorial changes
* Tackling more mypy issues
* Reintroduce from FARM and fix last mypy issues hopefully
* Re-enable old-style imports
* Fix some more import from the top-level package in an attempt to sort out circular imports
* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests
* Change document_store into document_stores
* Update imports in tutorials
* Add latest docstring and tutorial changes
* Probably fixes summarizer tests
* Improve the old-style import allowing module imports (should work)
* Try to fix the docs
* Remove dedicated KnowledgeGraph page from autodocs
* Remove dedicated GraphRetriever page from autodocs
* Fix generate_docstrings.sh with an updated list of yaml files to look for
* Fix some more modules in the docs
* Fix the document stores docs too
* Fix a small issue on Tutorial14
* Add latest docstring and tutorial changes
* Add deprecation warning to old-style imports
* Remove stray folder and import Dict into dense.py
* Change import path for MLFlowLogger
* Add old loggers path to the import path aliases
* Fix debug output of convert_ipynb.py
* Fix circular import on BaseRetriever
* Missed one merge block
* re-run tutorial 5
* Fix imports in tutorial 5
* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base
* Add latest docstring and tutorial changes
* Fix typo in utils __init__
* Fix a few more imports
* Fix benchmarks too
* New-style imports in test_knowledge_graph
* Rollback setup.py
* Rollback squad_to_dpr too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add delete_labels() except for weaviate doc store
* Add latest docstring and tutorial changes
* Add test for delete_labels()
* Adapt filter for label deletion to different doc stores in test
* Allow delete labels by _id in elasticsearch
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* re-add bugfix after merge
* Add ids as optional parameter
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make InMemoryDocumentStore accept and apply filters in delete_documents()
* Modify test_document_store.py to test the filtered deletion in memory, sql and milvus too
* Make FAISSDocumentStore accept and properly apply filters in delete_documents()
* Add latest docstring and tutorial changes
* Remove accidentally duplicated test
* Remove unnecessary decorators from test/test_document_store.py::test_delete_documents_with_filters
* Add embeddings count test for FAISS and Milvus; Milvus fails it.
* Fixed a bug that made Milvus not deleting embeddings
* Remove batch size parametrization in tests & update all documentstore's docstrings with a filter example
* Add latest docstring and tutorial changes
Co-authored-by: prafgup <prafulgupta6@gmail.com>
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* update readme and contributing.md
* update contributing
* adjust example
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* [UPDT] delete_all_documents() replaced by delete_documents()
* [UPDT] warning logs to be fixed
* [UPDT] delete_all_documents() renamed and the same method added
Co-authored-by: Ram Garg <ramgarg102@gmai.com>
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Make label field flexible, add docstrings
* Satisfy mypy
* Fix failing tests
* Adjust docstring
* Fix tutorial
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* [document_stores]Add the progressbar in update_embeddings() to track the overall documents progress closed#1037
* change 2nd level loop to docs. switch to tqdm.auto.
* [document_stores] Elasticsearch new method get_document_without_embedding_count() added.
* [test_case] Elasticsearch documentstore get_document_without_embedding_count() test case added.
* [document_stores] Add new bool arg in get_document_count() method and fixed#1082
* [document_stores] typo fixed#1082
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* using text hash as id to prevent document duplication. Also providing a way customize it.
* Add latest docstring and tutorial changes
* Fixing duplicate value test when text is same
* Adding test for duplicate ids in document store
* Changing exception to generic Exception type
* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>