* Add tests for missing init and super
* Update Documentation & Code Style
* change in with endswith
* Move test in pipeline.py and change test in pipeline_yaml.py
* Update Documentation & Code Style
* Use caplog to test the warning
* Update Documentation & Code Style
* move tests into test_pipeline and use get_config
* Update Documentation & Code Style
* Unmock version name
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* remove duplicate imports
* fix ungrouped-imports
* Fix wrong-import-position
* Fix unused-import
* pyproject.toml
* Working on wrong-import-order
* Solve wrong-import-order
* fix Pool import
* Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py
* remove Converter from modeling
* Fix mypy issues on adaptive_model.py
* create es_converter.py
* remove converter import
* change import path in tests
* Restructure REST API to not rely on global vars from search.apy and improve tests
* Fix openapi generator
* Move variable initialization
* Change type of FilterRequest.filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Delete files in _src
* Filter unused images and re-add images that were in use in docs/img
* Remove all usages of user-images.githubusercontent.com
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
* ensure correct embedding_encoder is loaded when embedding_model is a sentence-transformers model but model_format is missing or wrong
* minor refactoring
* do not update model_format and ensure a warning is logged when it could be wrong
* Apply black
* Apply black
Co-authored-by: Michele Pangrazzi <michele@wonderflow.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* extract extension based on file's content
* Add python-magic dependency
* fix the _estimate_extension function and lowercase the file extensions
* check if the FileTypeClassifier can be imported
* add test and new file types
* fix typing
* import Optional
* revert Optional and make sure a string is always returned
* fix test so that it skips markdown files
* Emulate Code & Docs action
* Generate schemas
* Tidy up test code & extensioness files
* Improve error messages
* Revert schema changes
* Emulate black and docs CI again
* Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents
* Add type
* Update Weaviate version on the CI
* Fix bug on get_document_count where there are no documents
* Add more info in the docstrings of get_all_documents and get_all_documents_generator
* Add latest docstring and tutorial changes
* Apply Black
* Update Documentation & Code Style
* Trigger pipeline
* Update Documentation & Code Style
* Include StefanBogdan feedback
* Fix mypy issues and LogicalFilterClause
* Add more types
* Update Documentation & Code Style
* update setup.cfg
* Upgrade weaviate containers too
* Allow to filter for content field in Weaviate
* Use convert_to_weaviate instead of convert_to_pinecone
* Fix _get_all_documents_in_index
* Update docstrings and docs
* Catching an exception in get_document(s)_by_id
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* EvaluationSetClient for deepset cloud to fetch evaluation sets and labels for one specific evaluation set
* make DeepsetCloudDocumentStore able to fetch uploaded evaluation set names
* fix missing renaming of get_evaluation_set_names in DeepsetCloudDocumentStore
* update documentation for evaluation set functionality in deepset cloud document store
* DeepsetCloudDocumentStore tests for evaluation set functionality
* rename index to evaluation_set_name for DeepsetCloudDocumentStore evaluation set functionality
* raise DeepsetCloudError when no labels were found for evaluation set
* make use of .get_with_auto_paging in EvaluationSetClient
* Return result of get_with_auto_paging() as it parses the response already
* Make schema import source more specific
* fetch all evaluation sets for a workspace in deepset Cloud
* Rename evaluation_set_name to label_index
* make use of generator functionality for fetching labels
* Update Documentation & Code Style
* Adjust function input for DeepsetCloudDocumentStore.get_all_labels, adjust tests for it, fix typos, make linter happy
* Match error message with pytest.raises
* Update Documentation & Code Style
* DeepsetCloudDocumentStore.get_labels_count raises DeepsetCloudError when no evaluation set was found to count labels on
* remove unneeded import in tests
* DeepsetCloudDocumentStore tests, make reponse bodies a string through json.dumps
* DeepsetcloudDocumentStore.get_label_count - move raise to return
* stringify uuid before json.dump as uuid is not serilizable
* DeepsetcloudDocumentStore - adjust response mocking in tests
* DeepsetcloudDocumentStore - json dump response body in test
* DeepsetCloudDocumentStore introduce label_index, EvaluationSetClient rename label_index to evaluation_set
* Update Documentation & Code Style
* DeepsetCloudDocumentStore rename evaluation_set to evaluation_set_response as there is a name clash with the input variable
* DeepsetCloudDocumentStore - rename missed variable in test
* DeepsetCloudDocumentStore - rename missed label_index to index in doc string, rename label_index to evaluation_set in EvaluationSetClient
* Update Documentation & Code Style
* DeepsetCloudDocumentStore - update docstrings for EvaluationSetClient
* DeepsetCloudDocumentStore - fix typo in doc string
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* use "must" instead of "should" for query-matching
* Update Documentation & Code Style
* fix mypy issue
* fix finding of new pylint version
* add test
* fix test_retrieval
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* added core install and functionality of pinecone doc store (init, upsert, query, delete)
* implemented core functionality of Pinecone doc store
* Update Documentation & Code Style
* updated filtering to use Haystack filtering and reduced default batch_size
* Update Documentation & Code Style
* removed debugging code
* updated Pinecone filtering to use filter_utils
* removed uneeded methods and minor tweaks to current methods
* fixed typing issues
* Update Documentation & Code Style
* Allow filters in al methods except get_embedding_count
* Fix skipping document store tests
* Update Documentation & Code Style
* Fix handling of Milvus1 and Milvus2 in tests
* Update Documentation & Code Style
* Fix handling of Milvus1 and Milvus2 in tests
* Update Documentation & Code Style
* Remove SQL from tests requiring embeddings
* Update Documentation & Code Style
* Fix get_embedding_count of Milvus2
* Make sure to start Milvus2 tests with a new collection
* Add pinecone to test suite
* Update Documentation & Code Style
* Fix typing
* Update Documentation & Code Style
* Add pinecone to docstores dependendcy
* Add PineconeDocStore to API Documentation
* Add missing comma
* Update Documentation & Code Style
* Adapt format of doc strings
* Update Documentation & Code Style
* Set API key as environment variable
* Skip Pinecone tests in forks
* Add sleep after deleting index
* Add sleep after deleting index
* Add sleep after creating index
* Add check if index ready
* Remove printing of index stats
* Create new index for each pinecone test
* Use RestAPI instead of Python API for describe_index_stats
* Fix accessing describe_index_stats
* Remove usages of describe_index_stats
* Run pinecone tests separately
* Update Documentation & Code Style
* Add pdftotext to pinecone tests
* Remove sleep from doc store fixture
* Add describe_index_stats
* Remove unused imports
* Use pull_request_target trigger
* Revert use pull_request_target trigger
* Remove set_config
* Add os to conftest
* Integrate review comments
* Set include_values to False
* Remove quotation marks from pinecone.Index type
* Update Documentation & Code Style
* Update Documentation & Code Style
* Fix number of args in error messages
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Similar test case seems to pass
* Update Documentation & Code Style
* Improve error message
* Slightly clarify info message
* Fix mismatch between node and node_class in the schema generation
* Remove condition that node class names cannot begin with Base and update tests
* Indentation
* Update Documentation & Code Style
* feedback
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add basic telemetry features
* change pipeline_config to _component_config
* Update Documentation & Code Style
* add super().__init__() calls to error classes
* make posthog mock work with python 3.7
* Update Documentation & Code Style
* update link to docs web page
* log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH)
* add comment on send_event in BaseComponent.init() and fix mypy
* mock NonPrivateParameters and fix pylint undefined-variable
* Update Documentation & Code Style
* check model path contains multiple /
* add test for writing to file
* add test for en-/disable telemetry
* Update Documentation & Code Style
* merge file deletion methods and ignore pylint global statement
* Update Documentation & Code Style
* set env variable in demo to activate telemetry
* fix mock of HAYSTACK_TELEMETRY_ENABLED
* fix mypy and linter
* add CI as env variable to execution contexts
* remove threading, add test for custom error event
* Update Documentation & Code Style
* simplify config/log file deletion
* add test for final event being sent
* force writing config file in test
* make test compatible with python 3.7
* switch to posthog production server
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* 'os' wrapper to function for brownfield support
* Changing function names and fixing default parameter values
* Including parameter keys
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes
* Make error composition work properly
* Clarify typing
* Help mypy a bit more
* Update Documentation & Code Style
* Enable autogenerated docs for Milvus1 and 2 separately
* Revert "Enable autogenerated docs for Milvus1 and 2 separately"
This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.
* Update Documentation & Code Style
* Re-enable 'additionalProperties: False'
* Add pipeline.type to JSON Schema, was somehow forgotten
* Disable additionalProperties on the pipeline properties too
* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)
* Cal super in PipelineValidationError
* Improve _read_pipeline_config_from_yaml's error handling
* Fix generate_json_schema.py to include document stores
* Fix json schemas (retro-fix 1.1.0 again)
* Improve custom errors printing, add link to docs
* Add function in BaseComponent to list its subclasses in a module
* Make some document stores base classes abstract
* Add marker 'integration' in pytest flags
* Slighly improve validation of pipelines at load
* Adding tests for YAML loading and validation
* Make custom_query Optional for validation issues
* Fix bug in _read_pipeline_config_from_yaml
* Improve error handling in BasePipeline and Pipeline and add DAG check
* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)
* Simplify errors slightly
* Add some YAML validation tests
* Remove load_from_config from BasePipeline, it was never used anyway
* Improve tests
* Include json-schemas in package
* Fix conftest imports
* Make BasePipeline abstract
* Improve mocking by making the test independent from the YAML version
* Add exportable_to_yaml decorator to forget about set_config on mock nodes
* Fix mypy errors
* Comment out one monkeypatch
* Fix typing again
* Improve error message for validation
* Add required properties to pipelines
* Fix YAML version for REST API YAMLs to 1.2.0
* Fix load_from_yaml call in load_from_deepset_cloud
* fix HaystackError.__getattr__
* Add super().__init__()in most nodes and docstore, comment set_config
* Remove type from REST API pipelines
* Remove useless init from doc2answers
* Call super in Seq3SeqGenerator
* Typo in deepsetcloud.py
* Fix rest api indexing error mismatch and mock version of JSON schema in all tests
* Working on pipeline tests
* Improve errors printing slightly
* Add back test_pipeline.yaml
* _json_schema.py supports different versions with identical schemas
* Add type to 0.7 schema for backwards compatibility
* Fix small bug in _json_schema.py
* Try alternative to generate json schemas on the CI
* Update Documentation & Code Style
* Make linux CI match autoformat CI
* Fix super-init-not-called
* Accidentally committed file
* Update Documentation & Code Style
* fix test_summarizer_translation.py's import
* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args
* Fix json schema for ray tests too
* Update Documentation & Code Style
* Reintroduce validation
* Usa unstable version in tests and rest api
* Make unstable support the latest versions
* Update Documentation & Code Style
* Remove needless fixture
* Make type in pipeline optional in the strings validation
* Fix schemas
* Fix string validation for pipeline type
* Improve validate_config_strings
* Remove type from test p[ipelines
* Update Documentation & Code Style
* Fix test_pipeline
* Removing more type from pipelines
* Temporary CI patc
* Fix issue with exportable_to_yaml never invoking the wrapped init
* rm stray file
* pipeline tests are green again
* Linux CI now needs .[all] to generate the schema
* Bugfixes, pipeline tests seems to be green
* Typo in version after merge
* Implement missing methods in Weaviate
* Trying to avoid FAISS tests from running in the Milvus1 test suite
* Fix some stray test paths and faiss index dumping
* Fix pytest markers list
* Temporarily disable cache to be able to see tests failures
* Fix pyproject.toml syntax
* Use only tmp_path
* Fix preprocessor signature after merge
* Fix faiss bug
* Fix Ray test
* Fix documentation issue by removing quotes from faiss type
* Update Documentation & Code Style
* use document properly in preprocessor tests
* Update Documentation & Code Style
* make preprocessor capable of handling documents
* import document
* Revert support for documents in preprocessor, do later
* Fix bug in _json_schema.py that was breaking validation
* re-enable cache
* Update Documentation & Code Style
* Simplify calling _json_schema.py from the CI
* Remove redundant ABC inheritance
* Ensure exportable_to_yaml works only on implementations
* Rename subclass to class_ in Meta
* Make run() and get_config() abstract in BasePipeline
* Revert unintended change in preprocessor
* Move outgoing_edges_input_node check inside try block
* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX
* Add check for a RecursionError on validate_config_strings
* Address usages of _pipeline_config in data silo and elasticsearch
* Rename _pipeline_config into _init_parameters
* Fix pytest marker and remove unused imports
* Remove most redundant ABCs
* Rename _init_parameters into _component_configuration
* Remove set_config and type from _component_configuration's dict
* Remove last instances of set_config and replace with super().__init__()
* Implement __init_subclass__ approach
* Simplify checks on the existence of _component_configuration
* Fix faiss issue
* Dynamic generation of node schemas & weed out old schemas
* Add debatable test
* Add docstring to debatable test
* Positive diff between schemas implemented
* Improve diff printing
* Rename REST API YAML files to trigger IDE validation
* Fix typing issues
* Fix more typing
* Typo in YAML filename
* Remove needless type:ignore
* Add tests
* Fix tests & validation feedback for accessory classes in custom nodes
* Refactor RAGeneratorType out
* Fix broken import in conftest
* Improve source error handling
* Remove unused import in test_eval.py breaking tests
* Fix changed error message in tests matches too
* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions
* Fix path to generate_openapi_specs.py in autoformat.yml
* Update Documentation & Code Style
* Add test for FAISSDocumentStore-like situations (superclass with init params)
* Update Documentation & Code Style
* Fix indentation
* Remove commented set_config
* Store model_name_or_path in FARMReader to use in DistillationDataSilo
* Rename _component_configuration into _component_config
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* automatically convert to DeepsetCloudDocumentStore
* shorten info text.
* fix typo
* the -> this
* add test
* ensure request body has only DeepsetCloudDocumentStores
* mark test as elasticsearch to fix milvus1 ci
* update remaining occurences of get_connection
* fix milvus2 import and fix wrong extra references
* change MilvusDocumentStore to Milvus1DocumentStore
* update milvus docstrings to reflect updated dependency management
* enable milvus 2 tests
* fix milvus2 env variable processing
* fix dropping collections for each milvus 2 test
* make Milvus 2 doc store tests work
* allow user to specify consistency level
* Fist attempt at running Milvus2 in the CI
* Install the correct pymilvus
* add batch deletion for milvus2
* change default from milvus 1 to milvus 2
* make milvus2 the default in the docstores extra
* Switch milvus1 and milvus2 in base test run on CI
* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'
* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py
* Enable autogenerated docs for Milvus1 and 2 separately
* Partial fix to docstring of Milvus2DocumentStore
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* add filter classes
* update filter comments
* Add util classes for converting filters (#2123)
* Apply Black
* reintroduce eval functions to filter ops
* Update documentation
* update to latest pymilvus version
* Apply Black
* fixing type hints
* Apply Black
* update write_documents method of milvus2 doc store
* remove unnecessary method
* update init
* remove changes to milvus 2 as they are part of other PR
* remove changes to milvus 2 as they are part of other PR
* updating doc strings to match elastic search filter doc
* Update Documentation & Code Style
* add support for case where there is no meta data defined for key
* update behaviour in case of field not existing in entry
* Update Documentation & Code Style
* add test for InMemoryDocumentStore extended meta data filtering
* make type hint more precise
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sarazanzo94@gmail.com>
* update remaining occurences of get_connection
* first commit to add extended metadata filtering support to sql
* fix bugs
* adding sql doc store instead of milvus
* removing updates to milvus2 from other PR
* fixing not operator
* delete left over line
* remove unnecessary import
* Update Documentation & Code Style
* fix circular import
* fix left over merge conflict
* Update Documentation & Code Style
* fix abstract class
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make YAML files get the same version as Haystack and throw warning at load in case of mismatch
* Update version of most YAMLs in the codebase (aesthethic chamge, only to avoid the warning).
* Remove quotes from version in tests
* Fix version in generate_json_schema.py
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* join node should allow reciprocal rank fusion
* Update Documentation & Code Style
* add missing merging mode
* tuples are immutable
* take correct results from pipeline
* Update Documentation & Code Style
* Simple docstrings, use ValueError
* Use K=60
* Minor refactoring
* precalculate expected result in test
* Update Documentation & Code Style
* refactor to make more clear
* rm unused imports
* tests should test only one thing
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: dmigo <d.f.goryunov@gmail.com>
* move commandline args to global conftest
* correct test exclude paths
* Update Documentation & Code Style
* exclude test_generator_pipeline_with_translator from windows ci
* exclude further oom tests
* enable log_cli
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pass documents as extra param to eval
* pass documents via labels to eval
* rename param in docs
* Update Documentation & Code Style
* Revert "rename param in docs"
This reverts commit 2f4c2ec79575e9dd33a8300785f789a327df36f4.
* Revert "pass documents via labels to eval"
This reverts commit dcc51e41f2637d093d81c7d193b873c17c36b174.
* simplify iterating through labels and docs
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>