* added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore
* Add documentation on metadata filtering in docstring
* docs
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Adding support for additional distance metrics for Weaviate
Fixes#3000
* Updating the docs
* Fixing error texts
* Fixing issues raised by the review
* Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity`
* [EMPTY] Re-trigger CI
* Fixing things based on review
* [EMPTY] Re-trigger CI
* add Opensearch extras
* let OpenSearchDocumentStore use opensearch-py
* Update Documentation & Code Style
* fix a bug found after adding tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Upgrading Weaviate used for testing to 1.14.1 from 1.11.0
This has also brought up an issue with one of the test filtering for value "a". This test has started to fail, as "a" is a default stopword in Weaviate, so I have changed this test to look for value "c" instead of value "a" to get around the stopword issue.
* Weaviate client upgrade
From v3.3.3 to v3.6.0
* Adding BM25 Retrieval to Weaviate
Weaviate now supports BM25 retrieval in experiment mode and with some limitations (like it cannot be combined with filters).
This commit adds support for inverted index (BM25) querying against Weaviate.
* Running Black on the recent code changes
* Update Documentation & Code Style
* Fixing linting issues after code changes by black
* The BM25 query needs to be in all lowercase for now
The BM25 query needs to be provided all lowercase while the functionality is in experimental mode in Weaviate.
See https://app.slack.com/client/T0181DYT9KN/C017EG2SL3H/thread/C017EG2SL3H-1658790227.208119
* Fixing method parameter docstring to highlight that they are not supported in Weaviate
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* extract common code for ES and OS into a base class
* Update Documentation & Code Style
* give the base class a more obvious name
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* move OpenSearchDocumentStore into its own Python module
* Update Documentation & Code Style
* mark test with (sigh) elasticsearch
* skip opensearch tests on windows
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add failing test
* Remove `**kwargs` from docstores' `__init__` functions (#2407)
* Remove kwargs from ESDocStore subclasses
* Remove kwargs from subclasses of SQLDocumentStore
* Remove kwargs from Weaviate
* Revert change in pinecone
* Fix tests
* Fix retriever test wirh weaviate
* Change Exception into DocumentStoreError
* Update Documentation & Code Style
* Remove `**kwargs` from `FARMReader` (#2413)
* Remove FARMReader kwargs without trying to replace them functionally
* Update Documentation & Code Style
* enforce same index values before and after saving/loading eval dataframes (#2398)
* Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350)
* Add tests for missing init and super
* Update Documentation & Code Style
* change in with endswith
* Move test in pipeline.py and change test in pipeline_yaml.py
* Update Documentation & Code Style
* Use caplog to test the warning
* Update Documentation & Code Style
* move tests into test_pipeline and use get_config
* Update Documentation & Code Style
* Unmock version name
* Improve variadic args test
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* remove duplicate imports
* fix ungrouped-imports
* Fix wrong-import-position
* Fix unused-import
* pyproject.toml
* Working on wrong-import-order
* Solve wrong-import-order
* fix Pool import
* Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py
* remove Converter from modeling
* Fix mypy issues on adaptive_model.py
* create es_converter.py
* remove converter import
* change import path in tests
* Restructure REST API to not rely on global vars from search.apy and improve tests
* Fix openapi generator
* Move variable initialization
* Change type of FilterRequest.filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents
* Add type
* Update Weaviate version on the CI
* Fix bug on get_document_count where there are no documents
* Add more info in the docstrings of get_all_documents and get_all_documents_generator
* Add latest docstring and tutorial changes
* Apply Black
* Update Documentation & Code Style
* Trigger pipeline
* Update Documentation & Code Style
* Include StefanBogdan feedback
* Fix mypy issues and LogicalFilterClause
* Add more types
* Update Documentation & Code Style
* update setup.cfg
* Upgrade weaviate containers too
* Allow to filter for content field in Weaviate
* Use convert_to_weaviate instead of convert_to_pinecone
* Fix _get_all_documents_in_index
* Update docstrings and docs
* Catching an exception in get_document(s)_by_id
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* EvaluationSetClient for deepset cloud to fetch evaluation sets and labels for one specific evaluation set
* make DeepsetCloudDocumentStore able to fetch uploaded evaluation set names
* fix missing renaming of get_evaluation_set_names in DeepsetCloudDocumentStore
* update documentation for evaluation set functionality in deepset cloud document store
* DeepsetCloudDocumentStore tests for evaluation set functionality
* rename index to evaluation_set_name for DeepsetCloudDocumentStore evaluation set functionality
* raise DeepsetCloudError when no labels were found for evaluation set
* make use of .get_with_auto_paging in EvaluationSetClient
* Return result of get_with_auto_paging() as it parses the response already
* Make schema import source more specific
* fetch all evaluation sets for a workspace in deepset Cloud
* Rename evaluation_set_name to label_index
* make use of generator functionality for fetching labels
* Update Documentation & Code Style
* Adjust function input for DeepsetCloudDocumentStore.get_all_labels, adjust tests for it, fix typos, make linter happy
* Match error message with pytest.raises
* Update Documentation & Code Style
* DeepsetCloudDocumentStore.get_labels_count raises DeepsetCloudError when no evaluation set was found to count labels on
* remove unneeded import in tests
* DeepsetCloudDocumentStore tests, make reponse bodies a string through json.dumps
* DeepsetcloudDocumentStore.get_label_count - move raise to return
* stringify uuid before json.dump as uuid is not serilizable
* DeepsetcloudDocumentStore - adjust response mocking in tests
* DeepsetcloudDocumentStore - json dump response body in test
* DeepsetCloudDocumentStore introduce label_index, EvaluationSetClient rename label_index to evaluation_set
* Update Documentation & Code Style
* DeepsetCloudDocumentStore rename evaluation_set to evaluation_set_response as there is a name clash with the input variable
* DeepsetCloudDocumentStore - rename missed variable in test
* DeepsetCloudDocumentStore - rename missed label_index to index in doc string, rename label_index to evaluation_set in EvaluationSetClient
* Update Documentation & Code Style
* DeepsetCloudDocumentStore - update docstrings for EvaluationSetClient
* DeepsetCloudDocumentStore - fix typo in doc string
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* added core install and functionality of pinecone doc store (init, upsert, query, delete)
* implemented core functionality of Pinecone doc store
* Update Documentation & Code Style
* updated filtering to use Haystack filtering and reduced default batch_size
* Update Documentation & Code Style
* removed debugging code
* updated Pinecone filtering to use filter_utils
* removed uneeded methods and minor tweaks to current methods
* fixed typing issues
* Update Documentation & Code Style
* Allow filters in al methods except get_embedding_count
* Fix skipping document store tests
* Update Documentation & Code Style
* Fix handling of Milvus1 and Milvus2 in tests
* Update Documentation & Code Style
* Fix handling of Milvus1 and Milvus2 in tests
* Update Documentation & Code Style
* Remove SQL from tests requiring embeddings
* Update Documentation & Code Style
* Fix get_embedding_count of Milvus2
* Make sure to start Milvus2 tests with a new collection
* Add pinecone to test suite
* Update Documentation & Code Style
* Fix typing
* Update Documentation & Code Style
* Add pinecone to docstores dependendcy
* Add PineconeDocStore to API Documentation
* Add missing comma
* Update Documentation & Code Style
* Adapt format of doc strings
* Update Documentation & Code Style
* Set API key as environment variable
* Skip Pinecone tests in forks
* Add sleep after deleting index
* Add sleep after deleting index
* Add sleep after creating index
* Add check if index ready
* Remove printing of index stats
* Create new index for each pinecone test
* Use RestAPI instead of Python API for describe_index_stats
* Fix accessing describe_index_stats
* Remove usages of describe_index_stats
* Run pinecone tests separately
* Update Documentation & Code Style
* Add pdftotext to pinecone tests
* Remove sleep from doc store fixture
* Add describe_index_stats
* Remove unused imports
* Use pull_request_target trigger
* Revert use pull_request_target trigger
* Remove set_config
* Add os to conftest
* Integrate review comments
* Set include_values to False
* Remove quotation marks from pinecone.Index type
* Update Documentation & Code Style
* Update Documentation & Code Style
* Fix number of args in error messages
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* 'os' wrapper to function for brownfield support
* Changing function names and fixing default parameter values
* Including parameter keys
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes
* Make error composition work properly
* Clarify typing
* Help mypy a bit more
* Update Documentation & Code Style
* Enable autogenerated docs for Milvus1 and 2 separately
* Revert "Enable autogenerated docs for Milvus1 and 2 separately"
This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.
* Update Documentation & Code Style
* Re-enable 'additionalProperties: False'
* Add pipeline.type to JSON Schema, was somehow forgotten
* Disable additionalProperties on the pipeline properties too
* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)
* Cal super in PipelineValidationError
* Improve _read_pipeline_config_from_yaml's error handling
* Fix generate_json_schema.py to include document stores
* Fix json schemas (retro-fix 1.1.0 again)
* Improve custom errors printing, add link to docs
* Add function in BaseComponent to list its subclasses in a module
* Make some document stores base classes abstract
* Add marker 'integration' in pytest flags
* Slighly improve validation of pipelines at load
* Adding tests for YAML loading and validation
* Make custom_query Optional for validation issues
* Fix bug in _read_pipeline_config_from_yaml
* Improve error handling in BasePipeline and Pipeline and add DAG check
* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)
* Simplify errors slightly
* Add some YAML validation tests
* Remove load_from_config from BasePipeline, it was never used anyway
* Improve tests
* Include json-schemas in package
* Fix conftest imports
* Make BasePipeline abstract
* Improve mocking by making the test independent from the YAML version
* Add exportable_to_yaml decorator to forget about set_config on mock nodes
* Fix mypy errors
* Comment out one monkeypatch
* Fix typing again
* Improve error message for validation
* Add required properties to pipelines
* Fix YAML version for REST API YAMLs to 1.2.0
* Fix load_from_yaml call in load_from_deepset_cloud
* fix HaystackError.__getattr__
* Add super().__init__()in most nodes and docstore, comment set_config
* Remove type from REST API pipelines
* Remove useless init from doc2answers
* Call super in Seq3SeqGenerator
* Typo in deepsetcloud.py
* Fix rest api indexing error mismatch and mock version of JSON schema in all tests
* Working on pipeline tests
* Improve errors printing slightly
* Add back test_pipeline.yaml
* _json_schema.py supports different versions with identical schemas
* Add type to 0.7 schema for backwards compatibility
* Fix small bug in _json_schema.py
* Try alternative to generate json schemas on the CI
* Update Documentation & Code Style
* Make linux CI match autoformat CI
* Fix super-init-not-called
* Accidentally committed file
* Update Documentation & Code Style
* fix test_summarizer_translation.py's import
* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args
* Fix json schema for ray tests too
* Update Documentation & Code Style
* Reintroduce validation
* Usa unstable version in tests and rest api
* Make unstable support the latest versions
* Update Documentation & Code Style
* Remove needless fixture
* Make type in pipeline optional in the strings validation
* Fix schemas
* Fix string validation for pipeline type
* Improve validate_config_strings
* Remove type from test p[ipelines
* Update Documentation & Code Style
* Fix test_pipeline
* Removing more type from pipelines
* Temporary CI patc
* Fix issue with exportable_to_yaml never invoking the wrapped init
* rm stray file
* pipeline tests are green again
* Linux CI now needs .[all] to generate the schema
* Bugfixes, pipeline tests seems to be green
* Typo in version after merge
* Implement missing methods in Weaviate
* Trying to avoid FAISS tests from running in the Milvus1 test suite
* Fix some stray test paths and faiss index dumping
* Fix pytest markers list
* Temporarily disable cache to be able to see tests failures
* Fix pyproject.toml syntax
* Use only tmp_path
* Fix preprocessor signature after merge
* Fix faiss bug
* Fix Ray test
* Fix documentation issue by removing quotes from faiss type
* Update Documentation & Code Style
* use document properly in preprocessor tests
* Update Documentation & Code Style
* make preprocessor capable of handling documents
* import document
* Revert support for documents in preprocessor, do later
* Fix bug in _json_schema.py that was breaking validation
* re-enable cache
* Update Documentation & Code Style
* Simplify calling _json_schema.py from the CI
* Remove redundant ABC inheritance
* Ensure exportable_to_yaml works only on implementations
* Rename subclass to class_ in Meta
* Make run() and get_config() abstract in BasePipeline
* Revert unintended change in preprocessor
* Move outgoing_edges_input_node check inside try block
* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX
* Add check for a RecursionError on validate_config_strings
* Address usages of _pipeline_config in data silo and elasticsearch
* Rename _pipeline_config into _init_parameters
* Fix pytest marker and remove unused imports
* Remove most redundant ABCs
* Rename _init_parameters into _component_configuration
* Remove set_config and type from _component_configuration's dict
* Remove last instances of set_config and replace with super().__init__()
* Implement __init_subclass__ approach
* Simplify checks on the existence of _component_configuration
* Fix faiss issue
* Dynamic generation of node schemas & weed out old schemas
* Add debatable test
* Add docstring to debatable test
* Positive diff between schemas implemented
* Improve diff printing
* Rename REST API YAML files to trigger IDE validation
* Fix typing issues
* Fix more typing
* Typo in YAML filename
* Remove needless type:ignore
* Add tests
* Fix tests & validation feedback for accessory classes in custom nodes
* Refactor RAGeneratorType out
* Fix broken import in conftest
* Improve source error handling
* Remove unused import in test_eval.py breaking tests
* Fix changed error message in tests matches too
* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions
* Fix path to generate_openapi_specs.py in autoformat.yml
* Update Documentation & Code Style
* Add test for FAISSDocumentStore-like situations (superclass with init params)
* Update Documentation & Code Style
* Fix indentation
* Remove commented set_config
* Store model_name_or_path in FARMReader to use in DistillationDataSilo
* Rename _component_configuration into _component_config
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Bring back init defs to api in v1.2
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* update remaining occurences of get_connection
* fix milvus2 import and fix wrong extra references
* change MilvusDocumentStore to Milvus1DocumentStore
* update milvus docstrings to reflect updated dependency management
* enable milvus 2 tests
* fix milvus2 env variable processing
* fix dropping collections for each milvus 2 test
* make Milvus 2 doc store tests work
* allow user to specify consistency level
* Fist attempt at running Milvus2 in the CI
* Install the correct pymilvus
* add batch deletion for milvus2
* change default from milvus 1 to milvus 2
* make milvus2 the default in the docstores extra
* Switch milvus1 and milvus2 in base test run on CI
* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'
* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py
* Enable autogenerated docs for Milvus1 and 2 separately
* Partial fix to docstring of Milvus2DocumentStore
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* add filter classes
* update filter comments
* Add util classes for converting filters (#2123)
* Apply Black
* reintroduce eval functions to filter ops
* Update documentation
* update to latest pymilvus version
* Apply Black
* fixing type hints
* Apply Black
* update write_documents method of milvus2 doc store
* remove unnecessary method
* update init
* remove changes to milvus 2 as they are part of other PR
* remove changes to milvus 2 as they are part of other PR
* updating doc strings to match elastic search filter doc
* Update Documentation & Code Style
* add support for case where there is no meta data defined for key
* update behaviour in case of field not existing in entry
* Update Documentation & Code Style
* add test for InMemoryDocumentStore extended meta data filtering
* make type hint more precise
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* Upgrade pydoc-markdown and fix the YAMLs to work with it
* Pin pydoc-markdown to major version
* Generalize pydoc-markdown workflow
* Make a single Action to perform all tasks that require committing into the local branch
* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green
* Installing Jupyter deps for Black
* Build cache before running generation tasks
* Add check not to run the code generation on master
* Simplify push action
* Add more test deps in setup.cfg and remove from GH Action workflow
* Remove forced upgrades on pip install
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>