* move OpenSearchDocumentStore into its own Python module
* Update Documentation & Code Style
* mark test with (sigh) elasticsearch
* skip opensearch tests on windows
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Do not show success message on failed evalset upload
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Unify CI tests (from #2466)
* Update Documentation & Code Style
* Change folder names
* Fix markers list
* Remove marker 'slow', replaced with 'integration'
* Soften children check
* Start ES first so it has time to boot while Python is setup
* Run the full workflow
* Try to make pip upgrade on Windows
* Set KG tests as integration
* Update Documentation & Code Style
* typo
* faster pylint
* Make Pylint use the cache
* filter diff files for pylint
* debug pylint statement
* revert pylint changes
* Remove path from asserted log (fails on Windows)
* Skip preprocessor test on Windows
* Tackling Windows specific failures
* Fix pytest command for windows suites
* Remove \ from command
* Move poppler test into integration
* Skip opensearch test on windows
* Add tolerance in reader sas score for Windows
* Another pytorch approx
* Raise time limit for unit tests :(
* Skip poppler test on Windows CI
* Specify to pull with FF only in docs check
* temporarily run the docs check immediately
* Allow merge commit for now
* Try without fetch depth
* Accelerating test
* Accelerating test
* Add repository and ref alongside fetch-depth
* Separate out code&docs check from tests
* Use setup-python cache
* Delete custom action
* Remove the pull step in the docs check, will find a way to run on bot commits
* Add requirements.txt in .github for caching
* Actually install dependencies
* Change deps group for pylint
* Unclear why the requirements.txt is still required :/
* Fix the code check python setup
* Install all deps for pylint
* Make the autoformat check depend on tests and doc updates workflows
* Try installing dependencies in another order
* Try again to install the deps
* quoting the paths
* Ad back the requirements
* Try again to install rest_api and ui
* Change deps group
* Duplicate haystack install line
* See if the cache is the problem
* Disable also in mypy, who knows
* split the install step
* Split install step everywhere
* Revert "Separate out code&docs check from tests"
This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.
* Add back the action
* Proactive support for audio (see text2speech branch)
* Fix label generator tests
* Remove install of libsndfile1 on win temporarily
* exclude audio tests on win
* install ffmpeg for integration tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Rewrite crawler tests (very slow) and fix small crawler bug
* Update Documentation & Code Style
* compile the regex only once
* Factor out the html files & add content check to most tests
* Clarify that even starting URLs can be excluded
* Update Documentation & Code Style
* Change signature
* Fix failing test
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add possibility to upload evaluation sets to DC
* fix test_eval sas comparisons
* quickwin docstring feedback changes
* Add hint about annotation tool and mark optional and required columns
* minor changes to docstrings
* Do not deepcopy in get_components_definitions
* Update Documentation & Code Style
* comment
* unused import
* Add test to ensure env vars don't overwrite _component_config
* Update Documentation & Code Style
* Add test for get_config
* Add test to show the rename is not sufficient
* Update Documentation & Code Style
* copy only if it's strictly necessary
* Update Documentation & Code Style
* Apply suggestions from code review
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* review feedback
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* exit the while loop when we query less documents than available in Weaviate
* use monkeypatch fixture, remove unused markers
* we know key is there, use brackets to get the value
* use custom exception
* add warning message when we hit the QUERY_MAXIMUM_RESULTS problem
* restore pytest marker
* removed unused import
* make the warning message more clear
* Ray pipelines now validate
* Update Documentation & Code Style
* rename Ray pipeline in tests
* Add extras:ray to the test pipeline
* pylint
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Move super in OpenSearchDocumentStore and add small test
* Update Documentation & Code Style
* Add Opensearch container to the CI
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Remove BasePipeline and make a module for RayPipeline
* Can load pipelines from yaml, plenty of issues left
* Extract graph validation logic into _add_node_to_pipeline_graph & refactor load_from_config and add_node to use it
* Fix pipeline tests
* Move some tests out of test_pipeline.py and create MockDenseRetriever
* myoy and pylint (silencing too-many-public-methods)
* Fix issue found in some yaml files and in schema files
* Fix paths to YAML and fix some typos in Ray
* Fix eval tests
* Simplify MockDenseRetriever
* Fix Ray test
* Accidentally pushed merge coinflict, fixed
* Typo in schemas
* Typo in _json_schema.py
* Slightly reduce noisyness of version validation warnings
* Fix version logs tests
* Fix version logs tests again
* remove seemingly unused file
* Add check and test to avoid adding the same node to the pipeline twice
* Update Documentation & Code Style
* Revert config to pipeline_config
* Remo0ve unused import
* Complete reverting to pipeline_config
* Some more stray config=
* Update Documentation & Code Style
* Feedback
* Move back other_nodes tests into pipeline tests temporarily
* Update Documentation & Code Style
* Fixing tests
* Update Documentation & Code Style
* Fixing ray and standard pipeline tests
* Rename colliding load() methods in dense retrievers and faiss
* Update Documentation & Code Style
* Fix mypy on ray.py as well
* Add check for no root node
* Fix tests to use load_from_directory and load_index
* Try to workaround the disabled add_node of RayPipeline
* Update Documentation & Code Style
* Fix Ray test
* Fix FAISS tests
* Relax class check in _add_node_to_pipeline_graph
* Update Documentation & Code Style
* Try to fix mypy in ray.py
* unused import
* Try another fix for Ray
* Fix connector tests
* Update Documentation & Code Style
* Fix ray
* Update Documentation & Code Style
* use BaseComponent.load() in pipelines/base.py
* another round of feedback
* stray BaseComponent.load()
* Update Documentation & Code Style
* Fix FAISS tests too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* Add support for aliases in elasticsearch document store
* Add alias support for OpenSearch
* Missing variable index
* Update Documentation & Code Style
* Add unit test for elasticsearch alias support
* Fix unit test when index is not compatible with haystack
* Fix auto format conflict
* Add comment explaining for loop for alias
* Update Documentation & Code Style
Co-authored-by: Jonathan Gallon <jonathan.gallon@totalenergies.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* Add windows specific package for python-magic
* Disable some tests on Windows and add explanatory warning in case of issues with libmagic
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Change exception into warning, add strict_version param, and remove compatibility between schemas
* Simplify update_json_schema
* Rename unstable into master
* Prevent validate_config from changing the config to validate
* Fix version validation and add tests
* Rename master into ignore
* Complete parameter rename
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add failing test
* Remove `**kwargs` from docstores' `__init__` functions (#2407)
* Remove kwargs from ESDocStore subclasses
* Remove kwargs from subclasses of SQLDocumentStore
* Remove kwargs from Weaviate
* Revert change in pinecone
* Fix tests
* Fix retriever test wirh weaviate
* Change Exception into DocumentStoreError
* Update Documentation & Code Style
* Remove `**kwargs` from `FARMReader` (#2413)
* Remove FARMReader kwargs without trying to replace them functionally
* Update Documentation & Code Style
* enforce same index values before and after saving/loading eval dataframes (#2398)
* Add tests for missing `__init__` and `super().__init__()` in custom nodes (#2350)
* Add tests for missing init and super
* Update Documentation & Code Style
* change in with endswith
* Move test in pipeline.py and change test in pipeline_yaml.py
* Update Documentation & Code Style
* Use caplog to test the warning
* Update Documentation & Code Style
* move tests into test_pipeline and use get_config
* Update Documentation & Code Style
* Unmock version name
* Improve variadic args test
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add tests for missing init and super
* Update Documentation & Code Style
* change in with endswith
* Move test in pipeline.py and change test in pipeline_yaml.py
* Update Documentation & Code Style
* Use caplog to test the warning
* Update Documentation & Code Style
* move tests into test_pipeline and use get_config
* Update Documentation & Code Style
* Unmock version name
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* remove duplicate imports
* fix ungrouped-imports
* Fix wrong-import-position
* Fix unused-import
* pyproject.toml
* Working on wrong-import-order
* Solve wrong-import-order
* fix Pool import
* Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py
* remove Converter from modeling
* Fix mypy issues on adaptive_model.py
* create es_converter.py
* remove converter import
* change import path in tests
* Restructure REST API to not rely on global vars from search.apy and improve tests
* Fix openapi generator
* Move variable initialization
* Change type of FilterRequest.filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Delete files in _src
* Filter unused images and re-add images that were in use in docs/img
* Remove all usages of user-images.githubusercontent.com
Co-authored-by: ZanSara <sarazanzo94@gmail.com>
* ensure correct embedding_encoder is loaded when embedding_model is a sentence-transformers model but model_format is missing or wrong
* minor refactoring
* do not update model_format and ensure a warning is logged when it could be wrong
* Apply black
* Apply black
Co-authored-by: Michele Pangrazzi <michele@wonderflow.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* extract extension based on file's content
* Add python-magic dependency
* fix the _estimate_extension function and lowercase the file extensions
* check if the FileTypeClassifier can be imported
* add test and new file types
* fix typing
* import Optional
* revert Optional and make sure a string is always returned
* fix test so that it skips markdown files
* Emulate Code & Docs action
* Generate schemas
* Tidy up test code & extensioness files
* Improve error messages
* Revert schema changes
* Emulate black and docs CI again
* Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents
* Add type
* Update Weaviate version on the CI
* Fix bug on get_document_count where there are no documents
* Add more info in the docstrings of get_all_documents and get_all_documents_generator
* Add latest docstring and tutorial changes
* Apply Black
* Update Documentation & Code Style
* Trigger pipeline
* Update Documentation & Code Style
* Include StefanBogdan feedback
* Fix mypy issues and LogicalFilterClause
* Add more types
* Update Documentation & Code Style
* update setup.cfg
* Upgrade weaviate containers too
* Allow to filter for content field in Weaviate
* Use convert_to_weaviate instead of convert_to_pinecone
* Fix _get_all_documents_in_index
* Update docstrings and docs
* Catching an exception in get_document(s)_by_id
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>