* fix milvus and faiss tests not running
* fix schema manually
* fix test_dpr_embedding test for milvus
* pip freeze on milvus tests
* fix milvus1 tests being executed: fix all_doc_stores order
* Revert "pip freeze on milvus tests"
This reverts commit 75ebb6f7e507bb8477e87d9e63b4a294f7946cab.
* make infer_required_doc_store more robust
* don't skip tests without docstore requirements
* use markers for docstore tests
* Use the %s syntax on all debug messages
* Use the %s syntax on some more debug messages
* Use the %s syntax on info messages
* Use the %s syntax on warning messages
* Use the %s syntax on error and exception messages
* mypy
* pylint
* trogger tutorials execution in CI
* trigger tutorials execution on CI
* black
* remove embeddings from repr
* fix Document `__repr__`
* address feedback
* mypy
* added meta fields for meta_config to be used during realtime testing of PineconeDocumentStore
* Add documentation on metadata filtering in docstring
* docs
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Adding support for additional distance metrics for Weaviate
Fixes#3000
* Updating the docs
* Fixing error texts
* Fixing issues raised by the review
* Addressing the last issue from the reviews - removing test `test_weaviate.py::test_similarity`
* [EMPTY] Re-trigger CI
* Fixing things based on review
* [EMPTY] Re-trigger CI
* add Opensearch extras
* let OpenSearchDocumentStore use opensearch-py
* Update Documentation & Code Style
* fix a bug found after adding tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Unify CI tests (from #2466)
* Update Documentation & Code Style
* Change folder names
* Fix markers list
* Remove marker 'slow', replaced with 'integration'
* Soften children check
* Start ES first so it has time to boot while Python is setup
* Run the full workflow
* Try to make pip upgrade on Windows
* Set KG tests as integration
* Update Documentation & Code Style
* typo
* faster pylint
* Make Pylint use the cache
* filter diff files for pylint
* debug pylint statement
* revert pylint changes
* Remove path from asserted log (fails on Windows)
* Skip preprocessor test on Windows
* Tackling Windows specific failures
* Fix pytest command for windows suites
* Remove \ from command
* Move poppler test into integration
* Skip opensearch test on windows
* Add tolerance in reader sas score for Windows
* Another pytorch approx
* Raise time limit for unit tests :(
* Skip poppler test on Windows CI
* Specify to pull with FF only in docs check
* temporarily run the docs check immediately
* Allow merge commit for now
* Try without fetch depth
* Accelerating test
* Accelerating test
* Add repository and ref alongside fetch-depth
* Separate out code&docs check from tests
* Use setup-python cache
* Delete custom action
* Remove the pull step in the docs check, will find a way to run on bot commits
* Add requirements.txt in .github for caching
* Actually install dependencies
* Change deps group for pylint
* Unclear why the requirements.txt is still required :/
* Fix the code check python setup
* Install all deps for pylint
* Make the autoformat check depend on tests and doc updates workflows
* Try installing dependencies in another order
* Try again to install the deps
* quoting the paths
* Ad back the requirements
* Try again to install rest_api and ui
* Change deps group
* Duplicate haystack install line
* See if the cache is the problem
* Disable also in mypy, who knows
* split the install step
* Split install step everywhere
* Revert "Separate out code&docs check from tests"
This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.
* Add back the action
* Proactive support for audio (see text2speech branch)
* Fix label generator tests
* Remove install of libsndfile1 on win temporarily
* exclude audio tests on win
* install ffmpeg for integration tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Remove BasePipeline and make a module for RayPipeline
* Can load pipelines from yaml, plenty of issues left
* Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it
* Fix pipeline tests
* Move some tests out of test_pipeline.py and create MockDenseRetriever
* myoy and pylint (silencing too-many-public-methods)
* Fix issue found in some yaml files and in schema files
* Fix paths to YAML and fix some typos in Ray
* Fix eval tests
* Simplify MockDenseRetriever
* Fix Ray test
* Accidentally pushed merge coinflict, fixed
* Typo in schemas
* Typo in _json_schema.py
* Slightly reduce noisyness of version validation warnings
* Fix version logs tests
* Fix version logs tests again
* remove seemingly unused file
* Add check and test to avoid adding the same node to the pipeline twice
* Update Documentation & Code Style
* Revert config to pipeline_config
* Remo0ve unused import
* Complete reverting to pipeline_config
* Some more stray config=
* Update Documentation & Code Style
* Feedback
* Move back other_nodes tests into pipeline tests temporarily
* Update Documentation & Code Style
* Fixing tests
* Update Documentation & Code Style
* Fixing ray and standard pipeline tests
* Rename colliding load() methods in dense retrievers and faiss
* Update Documentation & Code Style
* Fix mypy on ray.py as well
* Add check for no root node
* Fix tests to use load_from_directory and load_index
* Try to workaround the disabled add_node of RayPipeline
* Update Documentation & Code Style
* Fix Ray test
* Fix FAISS tests
* Relax class check in _add_node_to_pipeline_graph
* Update Documentation & Code Style
* Try to fix mypy in ray.py
* unused import
* Try another fix for Ray
* Fix connector tests
* Update Documentation & Code Style
* Fix ray
* Update Documentation & Code Style
* use BaseComponent.load() in pipelines/base.py
* another round of feedback
* stray BaseComponent.load()
* Update Documentation & Code Style
* Fix FAISS tests too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* Add failing test
* Remove `**kwargs` from docstores' `__init__` functions (#2407)
* Remove kwargs from ESDocStore subclasses
* Remove kwargs from subclasses of SQLDocumentStore
* Remove kwargs from Weaviate
* Revert change in pinecone
* Fix tests
* Fix retriever test wirh weaviate
* Change Exception into DocumentStoreError
* Update Documentation & Code Style
* Remove `**kwargs` from `FARMReader` (#2413)
* Remove FARMReader kwargs without trying to replace them functionally
* Update Documentation & Code Style
* enforce same index values before and after saving/loading eval dataframes (#2398)
* Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350)
* Add tests for missing init and super
* Update Documentation & Code Style
* change in with endswith
* Move test in pipeline.py and change test in pipeline_yaml.py
* Update Documentation & Code Style
* Use caplog to test the warning
* Update Documentation & Code Style
* move tests into test_pipeline and use get_config
* Update Documentation & Code Style
* Unmock version name
* Improve variadic args test
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents
* Add type
* Update Weaviate version on the CI
* Fix bug on get_document_count where there are no documents
* Add more info in the docstrings of get_all_documents and get_all_documents_generator
* Add latest docstring and tutorial changes
* Apply Black
* Update Documentation & Code Style
* Trigger pipeline
* Update Documentation & Code Style
* Include StefanBogdan feedback
* Fix mypy issues and LogicalFilterClause
* Add more types
* Update Documentation & Code Style
* update setup.cfg
* Upgrade weaviate containers too
* Allow to filter for content field in Weaviate
* Use convert_to_weaviate instead of convert_to_pinecone
* Fix _get_all_documents_in_index
* Update docstrings and docs
* Catching an exception in get_document(s)_by_id
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* added core install and functionality of pinecone doc store (init, upsert, query, delete)
* implemented core functionality of Pinecone doc store
* Update Documentation & Code Style
* updated filtering to use Haystack filtering and reduced default batch_size
* Update Documentation & Code Style
* removed debugging code
* updated Pinecone filtering to use filter_utils
* removed uneeded methods and minor tweaks to current methods
* fixed typing issues
* Update Documentation & Code Style
* Allow filters in al methods except get_embedding_count
* Fix skipping document store tests
* Update Documentation & Code Style
* Fix handling of Milvus1 and Milvus2 in tests
* Update Documentation & Code Style
* Fix handling of Milvus1 and Milvus2 in tests
* Update Documentation & Code Style
* Remove SQL from tests requiring embeddings
* Update Documentation & Code Style
* Fix get_embedding_count of Milvus2
* Make sure to start Milvus2 tests with a new collection
* Add pinecone to test suite
* Update Documentation & Code Style
* Fix typing
* Update Documentation & Code Style
* Add pinecone to docstores dependendcy
* Add PineconeDocStore to API Documentation
* Add missing comma
* Update Documentation & Code Style
* Adapt format of doc strings
* Update Documentation & Code Style
* Set API key as environment variable
* Skip Pinecone tests in forks
* Add sleep after deleting index
* Add sleep after deleting index
* Add sleep after creating index
* Add check if index ready
* Remove printing of index stats
* Create new index for each pinecone test
* Use RestAPI instead of Python API for describe_index_stats
* Fix accessing describe_index_stats
* Remove usages of describe_index_stats
* Run pinecone tests separately
* Update Documentation & Code Style
* Add pdftotext to pinecone tests
* Remove sleep from doc store fixture
* Add describe_index_stats
* Remove unused imports
* Use pull_request_target trigger
* Revert use pull_request_target trigger
* Remove set_config
* Add os to conftest
* Integrate review comments
* Set include_values to False
* Remove quotation marks from pinecone.Index type
* Update Documentation & Code Style
* Update Documentation & Code Style
* Fix number of args in error messages
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* add basic telemetry features
* change pipeline_config to _component_config
* Update Documentation & Code Style
* add super().__init__() calls to error classes
* make posthog mock work with python 3.7
* Update Documentation & Code Style
* update link to docs web page
* log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH)
* add comment on send_event in BaseComponent.init() and fix mypy
* mock NonPrivateParameters and fix pylint undefined-variable
* Update Documentation & Code Style
* check model path contains multiple /
* add test for writing to file
* add test for en-/disable telemetry
* Update Documentation & Code Style
* merge file deletion methods and ignore pylint global statement
* Update Documentation & Code Style
* set env variable in demo to activate telemetry
* fix mock of HAYSTACK_TELEMETRY_ENABLED
* fix mypy and linter
* add CI as env variable to execution contexts
* remove threading, add test for custom error event
* Update Documentation & Code Style
* simplify config/log file deletion
* add test for final event being sent
* force writing config file in test
* make test compatible with python 3.7
* switch to posthog production server
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes
* Make error composition work properly
* Clarify typing
* Help mypy a bit more
* Update Documentation & Code Style
* Enable autogenerated docs for Milvus1 and 2 separately
* Revert "Enable autogenerated docs for Milvus1 and 2 separately"
This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.
* Update Documentation & Code Style
* Re-enable 'additionalProperties: False'
* Add pipeline.type to JSON Schema, was somehow forgotten
* Disable additionalProperties on the pipeline properties too
* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)
* Cal super in PipelineValidationError
* Improve _read_pipeline_config_from_yaml's error handling
* Fix generate_json_schema.py to include document stores
* Fix json schemas (retro-fix 1.1.0 again)
* Improve custom errors printing, add link to docs
* Add function in BaseComponent to list its subclasses in a module
* Make some document stores base classes abstract
* Add marker 'integration' in pytest flags
* Slighly improve validation of pipelines at load
* Adding tests for YAML loading and validation
* Make custom_query Optional for validation issues
* Fix bug in _read_pipeline_config_from_yaml
* Improve error handling in BasePipeline and Pipeline and add DAG check
* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)
* Simplify errors slightly
* Add some YAML validation tests
* Remove load_from_config from BasePipeline, it was never used anyway
* Improve tests
* Include json-schemas in package
* Fix conftest imports
* Make BasePipeline abstract
* Improve mocking by making the test independent from the YAML version
* Add exportable_to_yaml decorator to forget about set_config on mock nodes
* Fix mypy errors
* Comment out one monkeypatch
* Fix typing again
* Improve error message for validation
* Add required properties to pipelines
* Fix YAML version for REST API YAMLs to 1.2.0
* Fix load_from_yaml call in load_from_deepset_cloud
* fix HaystackError.__getattr__
* Add super().__init__()in most nodes and docstore, comment set_config
* Remove type from REST API pipelines
* Remove useless init from doc2answers
* Call super in Seq3SeqGenerator
* Typo in deepsetcloud.py
* Fix rest api indexing error mismatch and mock version of JSON schema in all tests
* Working on pipeline tests
* Improve errors printing slightly
* Add back test_pipeline.yaml
* _json_schema.py supports different versions with identical schemas
* Add type to 0.7 schema for backwards compatibility
* Fix small bug in _json_schema.py
* Try alternative to generate json schemas on the CI
* Update Documentation & Code Style
* Make linux CI match autoformat CI
* Fix super-init-not-called
* Accidentally committed file
* Update Documentation & Code Style
* fix test_summarizer_translation.py's import
* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args
* Fix json schema for ray tests too
* Update Documentation & Code Style
* Reintroduce validation
* Usa unstable version in tests and rest api
* Make unstable support the latest versions
* Update Documentation & Code Style
* Remove needless fixture
* Make type in pipeline optional in the strings validation
* Fix schemas
* Fix string validation for pipeline type
* Improve validate_config_strings
* Remove type from test p[ipelines
* Update Documentation & Code Style
* Fix test_pipeline
* Removing more type from pipelines
* Temporary CI patc
* Fix issue with exportable_to_yaml never invoking the wrapped init
* rm stray file
* pipeline tests are green again
* Linux CI now needs .[all] to generate the schema
* Bugfixes, pipeline tests seems to be green
* Typo in version after merge
* Implement missing methods in Weaviate
* Trying to avoid FAISS tests from running in the Milvus1 test suite
* Fix some stray test paths and faiss index dumping
* Fix pytest markers list
* Temporarily disable cache to be able to see tests failures
* Fix pyproject.toml syntax
* Use only tmp_path
* Fix preprocessor signature after merge
* Fix faiss bug
* Fix Ray test
* Fix documentation issue by removing quotes from faiss type
* Update Documentation & Code Style
* use document properly in preprocessor tests
* Update Documentation & Code Style
* make preprocessor capable of handling documents
* import document
* Revert support for documents in preprocessor, do later
* Fix bug in _json_schema.py that was breaking validation
* re-enable cache
* Update Documentation & Code Style
* Simplify calling _json_schema.py from the CI
* Remove redundant ABC inheritance
* Ensure exportable_to_yaml works only on implementations
* Rename subclass to class_ in Meta
* Make run() and get_config() abstract in BasePipeline
* Revert unintended change in preprocessor
* Move outgoing_edges_input_node check inside try block
* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX
* Add check for a RecursionError on validate_config_strings
* Address usages of _pipeline_config in data silo and elasticsearch
* Rename _pipeline_config into _init_parameters
* Fix pytest marker and remove unused imports
* Remove most redundant ABCs
* Rename _init_parameters into _component_configuration
* Remove set_config and type from _component_configuration's dict
* Remove last instances of set_config and replace with super().__init__()
* Implement __init_subclass__ approach
* Simplify checks on the existence of _component_configuration
* Fix faiss issue
* Dynamic generation of node schemas & weed out old schemas
* Add debatable test
* Add docstring to debatable test
* Positive diff between schemas implemented
* Improve diff printing
* Rename REST API YAML files to trigger IDE validation
* Fix typing issues
* Fix more typing
* Typo in YAML filename
* Remove needless type:ignore
* Add tests
* Fix tests & validation feedback for accessory classes in custom nodes
* Refactor RAGeneratorType out
* Fix broken import in conftest
* Improve source error handling
* Remove unused import in test_eval.py breaking tests
* Fix changed error message in tests matches too
* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions
* Fix path to generate_openapi_specs.py in autoformat.yml
* Update Documentation & Code Style
* Add test for FAISSDocumentStore-like situations (superclass with init params)
* Update Documentation & Code Style
* Fix indentation
* Remove commented set_config
* Store model_name_or_path in FARMReader to use in DistillationDataSilo
* Rename _component_configuration into _component_config
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* update remaining occurences of get_connection
* fix milvus2 import and fix wrong extra references
* change MilvusDocumentStore to Milvus1DocumentStore
* update milvus docstrings to reflect updated dependency management
* enable milvus 2 tests
* fix milvus2 env variable processing
* fix dropping collections for each milvus 2 test
* make Milvus 2 doc store tests work
* allow user to specify consistency level
* Fist attempt at running Milvus2 in the CI
* Install the correct pymilvus
* add batch deletion for milvus2
* change default from milvus 1 to milvus 2
* make milvus2 the default in the docstores extra
* Switch milvus1 and milvus2 in base test run on CI
* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'
* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py
* Enable autogenerated docs for Milvus1 and 2 separately
* Partial fix to docstring of Milvus2DocumentStore
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* move commandline args to global conftest
* correct test exclude paths
* Update Documentation & Code Style
* exclude test_generator_pipeline_with_translator from windows ci
* exclude further oom tests
* enable log_cli
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Upgrade pydoc-markdown and fix the YAMLs to work with it
* Pin pydoc-markdown to major version
* Generalize pydoc-markdown workflow
* Make a single Action to perform all tasks that require committing into the local branch
* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green
* Installing Jupyter deps for Black
* Build cache before running generation tasks
* Add check not to run the code generation on master
* Simplify push action
* Add more test deps in setup.cfg and remove from GH Action workflow
* Remove forced upgrades on pip install
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Testing black on ui/
* Applying black on docstores
* Add latest docstring and tutorial changes
* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too
* Remove comments
* Relax constraints on pydoc-markdown
* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade
* Fix a couple of bugs
* Add a type: ignore that was missing somehow
* Give path to black
* Apply Black
* Apply Black
* Relocate a couple of type: ignore
* Update documentation
* Make Linux CI run after applying Black
* Triggering Black
* Apply Black
* Remove dependency, does not work well
* Remove manually double trailing commas
* Update documentation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make FileTypeClassifier more flexible
* Make supported_types a init parameter
* Add tests and fix a couple of bugs
* Formatting
* Fix mypy
* Implement feedback
* Fist attempt at using setup.cfg for dependency management
* Trying the new package on the CI and in Docker too
* Add composite extras_require
* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports
* Fix bug on class import and rephrase error message
* Introduce typing for optional modules and add type: ignore in sparse.py
* Include importlib_metadata backport for py3.7
* Add colab group to extra_requires
* Fix pillow version
* Fix grpcio
* Separate out the crawler as another extra
* Make paths relative in rest_api and ui
* Update the test matrix in the CI
* Add try catch statements around the optional imports too to account for direct imports
* Never mix direct deps with self-references and add ES deps to the base install
* Refactor several paths in tests to make them insensitive to the execution path
* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix
* Wrap pdf conversion utils into safe_import
* Update some tutorials and rever Milvus1 as default for now, see #2067
* Fix mypy config
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* minimal DCDocumentStore
* support filters
* implement get_documents_by_id
* handle not existing documents
* add docstrings
* auth added
* add tests
* generate docs
* Add latest docstring and tutorial changes
* add responses to dev dependencies
* fix tests
* support query() and quey_by_embedding()
* Add latest docstring and tutorial changes
* query tests added
* read api_key and api_endpoint from env
* Add latest docstring and tutorial changes
* support query() and quey_by_embedding()
* query tests added
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* support dynamic similarity and return_embedding values
* Add latest docstring and tutorial changes
* adjust KeywordDocumentStore description
* refactoring
* Add latest docstring and tutorial changes
* implement get_document_count and raise on all not implemented methods
* Add latest docstring and tutorial changes
* don't use abbreviation DC in comments and errors
* Add latest docstring and tutorial changes
* docstring added to KeywordDocumentStore
* Add latest docstring and tutorial changes
* enhanced api key set
* split tests into two parts
* change setup.py in order to work around build cache
* added link
* Add latest docstring and tutorial changes
* rename DCDocumentStore to DeepsetCloudDocumentStore
* Add latest docstring and tutorial changes
* remove dc.py
* reinsert link to docs
* fix imports
* Add latest docstring and tutorial changes
* better test structure
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
* Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints
* update_document_meta() was not using index properly
* Exclude ES and Memory from the cosine_sanity_check test
* move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* align document store similarity functions
* remove unnecessary imports
* undone accidental change
* stopped weaviate from pretending to support dot product similarity
* stopped weaviate from pretending to support dot product similarity
* Add latest docstring and tutorial changes
* fix fixture params for document stores
* use cosine similarity for most tests
* fix cosine similarity test
* fix faiss test
* fix weaviate test
* fix accidental deletion
* fix document_store fixture
* test fix; shouldn't be merged
* fix test_normalize_embeddings_diff_shapes
* probably a better fix
* fix for parameter combinations
* revert new pytest_generate_tests functionality
* simplify pytest_generate_tests
* normalize embeddings for test_dpr_embedding
* add to faiss doc that embeddings are normalized
* Add latest docstring and tutorial changes
* remove unnecessary parameters and add comments
* simplify two lines of memory.py into one
* test similarity scores with smaller language model
* fix test_similarity_score
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>