* Add BasePipeline.validate_config, BasePipeline.validate_yaml, and some new custom exception classes
* Make error composition work properly
* Clarify typing
* Help mypy a bit more
* Update Documentation & Code Style
* Enable autogenerated docs for Milvus1 and 2 separately
* Revert "Enable autogenerated docs for Milvus1 and 2 separately"
This reverts commit 282be4a78a6e95862a9b4c924fc3dea5ca71e28d.
* Update Documentation & Code Style
* Re-enable 'additionalProperties: False'
* Add pipeline.type to JSON Schema, was somehow forgotten
* Disable additionalProperties on the pipeline properties too
* Fix json-schemas for 1.1.0 and 1.2.0 (should not do it again in the future)
* Cal super in PipelineValidationError
* Improve _read_pipeline_config_from_yaml's error handling
* Fix generate_json_schema.py to include document stores
* Fix json schemas (retro-fix 1.1.0 again)
* Improve custom errors printing, add link to docs
* Add function in BaseComponent to list its subclasses in a module
* Make some document stores base classes abstract
* Add marker 'integration' in pytest flags
* Slighly improve validation of pipelines at load
* Adding tests for YAML loading and validation
* Make custom_query Optional for validation issues
* Fix bug in _read_pipeline_config_from_yaml
* Improve error handling in BasePipeline and Pipeline and add DAG check
* Move json schema generation into haystack/nodes/_json_schema.py (useful for tests)
* Simplify errors slightly
* Add some YAML validation tests
* Remove load_from_config from BasePipeline, it was never used anyway
* Improve tests
* Include json-schemas in package
* Fix conftest imports
* Make BasePipeline abstract
* Improve mocking by making the test independent from the YAML version
* Add exportable_to_yaml decorator to forget about set_config on mock nodes
* Fix mypy errors
* Comment out one monkeypatch
* Fix typing again
* Improve error message for validation
* Add required properties to pipelines
* Fix YAML version for REST API YAMLs to 1.2.0
* Fix load_from_yaml call in load_from_deepset_cloud
* fix HaystackError.__getattr__
* Add super().__init__()in most nodes and docstore, comment set_config
* Remove type from REST API pipelines
* Remove useless init from doc2answers
* Call super in Seq3SeqGenerator
* Typo in deepsetcloud.py
* Fix rest api indexing error mismatch and mock version of JSON schema in all tests
* Working on pipeline tests
* Improve errors printing slightly
* Add back test_pipeline.yaml
* _json_schema.py supports different versions with identical schemas
* Add type to 0.7 schema for backwards compatibility
* Fix small bug in _json_schema.py
* Try alternative to generate json schemas on the CI
* Update Documentation & Code Style
* Make linux CI match autoformat CI
* Fix super-init-not-called
* Accidentally committed file
* Update Documentation & Code Style
* fix test_summarizer_translation.py's import
* Mock YAML in a few suites, split and simplify test_pipeline_debug_and_validation.py::test_invalid_run_args
* Fix json schema for ray tests too
* Update Documentation & Code Style
* Reintroduce validation
* Usa unstable version in tests and rest api
* Make unstable support the latest versions
* Update Documentation & Code Style
* Remove needless fixture
* Make type in pipeline optional in the strings validation
* Fix schemas
* Fix string validation for pipeline type
* Improve validate_config_strings
* Remove type from test p[ipelines
* Update Documentation & Code Style
* Fix test_pipeline
* Removing more type from pipelines
* Temporary CI patc
* Fix issue with exportable_to_yaml never invoking the wrapped init
* rm stray file
* pipeline tests are green again
* Linux CI now needs .[all] to generate the schema
* Bugfixes, pipeline tests seems to be green
* Typo in version after merge
* Implement missing methods in Weaviate
* Trying to avoid FAISS tests from running in the Milvus1 test suite
* Fix some stray test paths and faiss index dumping
* Fix pytest markers list
* Temporarily disable cache to be able to see tests failures
* Fix pyproject.toml syntax
* Use only tmp_path
* Fix preprocessor signature after merge
* Fix faiss bug
* Fix Ray test
* Fix documentation issue by removing quotes from faiss type
* Update Documentation & Code Style
* use document properly in preprocessor tests
* Update Documentation & Code Style
* make preprocessor capable of handling documents
* import document
* Revert support for documents in preprocessor, do later
* Fix bug in _json_schema.py that was breaking validation
* re-enable cache
* Update Documentation & Code Style
* Simplify calling _json_schema.py from the CI
* Remove redundant ABC inheritance
* Ensure exportable_to_yaml works only on implementations
* Rename subclass to class_ in Meta
* Make run() and get_config() abstract in BasePipeline
* Revert unintended change in preprocessor
* Move outgoing_edges_input_node check inside try block
* Rename VALID_CODE_GEN_INPUT_REGEX into VALID_INPUT_REGEX
* Add check for a RecursionError on validate_config_strings
* Address usages of _pipeline_config in data silo and elasticsearch
* Rename _pipeline_config into _init_parameters
* Fix pytest marker and remove unused imports
* Remove most redundant ABCs
* Rename _init_parameters into _component_configuration
* Remove set_config and type from _component_configuration's dict
* Remove last instances of set_config and replace with super().__init__()
* Implement __init_subclass__ approach
* Simplify checks on the existence of _component_configuration
* Fix faiss issue
* Dynamic generation of node schemas & weed out old schemas
* Add debatable test
* Add docstring to debatable test
* Positive diff between schemas implemented
* Improve diff printing
* Rename REST API YAML files to trigger IDE validation
* Fix typing issues
* Fix more typing
* Typo in YAML filename
* Remove needless type:ignore
* Add tests
* Fix tests & validation feedback for accessory classes in custom nodes
* Refactor RAGeneratorType out
* Fix broken import in conftest
* Improve source error handling
* Remove unused import in test_eval.py breaking tests
* Fix changed error message in tests matches too
* Normalize generate_openapi_specs.py and generate_json_schema.py in the actions
* Fix path to generate_openapi_specs.py in autoformat.yml
* Update Documentation & Code Style
* Add test for FAISSDocumentStore-like situations (superclass with init params)
* Update Documentation & Code Style
* Fix indentation
* Remove commented set_config
* Store model_name_or_path in FARMReader to use in DistillationDataSilo
* Rename _component_configuration into _component_config
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* update remaining occurences of get_connection
* fix milvus2 import and fix wrong extra references
* change MilvusDocumentStore to Milvus1DocumentStore
* update milvus docstrings to reflect updated dependency management
* enable milvus 2 tests
* fix milvus2 env variable processing
* fix dropping collections for each milvus 2 test
* make Milvus 2 doc store tests work
* allow user to specify consistency level
* Fist attempt at running Milvus2 in the CI
* Install the correct pymilvus
* add batch deletion for milvus2
* change default from milvus 1 to milvus 2
* make milvus2 the default in the docstores extra
* Switch milvus1 and milvus2 in base test run on CI
* Rename docstore flags for pytest: 'milvus'->'milvus1', 'milvus2'->'milvus'
* Rename milvus.py->milvus1.py and milvus2x.py->milvus2.py
* Enable autogenerated docs for Milvus1 and 2 separately
* Partial fix to docstring of Milvus2DocumentStore
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Michel Bartels <kontakt@michelbartels.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* move commandline args to global conftest
* correct test exclude paths
* Update Documentation & Code Style
* exclude test_generator_pipeline_with_translator from windows ci
* exclude further oom tests
* enable log_cli
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* pass documents as extra param to eval
* pass documents via labels to eval
* rename param in docs
* Update Documentation & Code Style
* Revert "rename param in docs"
This reverts commit 2f4c2ec79575e9dd33a8300785f789a327df36f4.
* Revert "pass documents via labels to eval"
This reverts commit dcc51e41f2637d093d81c7d193b873c17c36b174.
* simplify iterating through labels and docs
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add filters attribute to labels and use in eval
* Add latest docstring and tutorial changes
* overwrite params if None
* populate filters from Label to MultiLabel
* add query_id in eval df and deepcopy params for each label
* fix mypy
* add test for aggregating filters in multilabel
* use query ids also in answers df
* loop through unique query_ids
* hash filters and query text as id
* Add latest docstring and tutorial changes
* fix top_k reader eval
* Apply Black
* rename query_id to id/multilabel_id
* Apply Black
* json dump filters in dataframe
* add filters and id to wrong_examples()
* Apply Black
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Testing black on ui/
* Applying black on docstores
* Add latest docstring and tutorial changes
* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too
* Remove comments
* Relax constraints on pydoc-markdown
* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade
* Fix a couple of bugs
* Add a type: ignore that was missing somehow
* Give path to black
* Apply Black
* Apply Black
* Relocate a couple of type: ignore
* Update documentation
* Make Linux CI run after applying Black
* Triggering Black
* Apply Black
* Remove dependency, does not work well
* Remove manually double trailing commas
* Update documentation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fist attempt at using setup.cfg for dependency management
* Trying the new package on the CI and in Docker too
* Add composite extras_require
* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports
* Fix bug on class import and rephrase error message
* Introduce typing for optional modules and add type: ignore in sparse.py
* Include importlib_metadata backport for py3.7
* Add colab group to extra_requires
* Fix pillow version
* Fix grpcio
* Separate out the crawler as another extra
* Make paths relative in rest_api and ui
* Update the test matrix in the CI
* Add try catch statements around the optional imports too to account for direct imports
* Never mix direct deps with self-references and add ES deps to the base install
* Refactor several paths in tests to make them insensitive to the execution path
* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix
* Wrap pdf conversion utils into safe_import
* Update some tutorials and rever Milvus1 as default for now, see #2067
* Fix mypy config
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* run predictions on ground-truth docs in reader
* build dataframe for closed/open domain eval
* fix looping through multilabel
* fix looping through multilabel's list of labels
* simplify collecting relevant docs
* switch closed-domain eval off by default
* Add latest docstring and tutorial changes
* handle edge case params not given
* renaming & generate pipeline eval report
* add test case for closed-domain eval metrics
* Add latest docstring and tutorial changes
* test report of closed-domain eval
* report closed-domain metrics only for answer metrics not doc metrics
* refactoring
* fix mypy & remove comment
* add second for-loop & use answer as method input
* renaming & add separate loop building docs eval df
* Add latest docstring and tutorial changes
* source /home/tstad/miniconda3/bin/activatechange column order for evaluatation dataframe (#1957)
conda activate haystack-dev2
* change column order for evaluatation dataframe
* added missing eval column node_input
* generic order for both document and answer returning nodes; ensure no columns get lost
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* fix column reordering after renaming of node_input
* simplify tests & add docu
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ju-gu <87523290+ju-gu@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
* set fixture scope to "function"
* run FARMReader without multiprocessing
* dispose off ray after tests
* run most expensive tasks first in test files
* run expensive tests first
* run garbage collector between tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* diable problematic eval tests for windows ci
* move standard pipeline eval tests to separate test file
* switch to elasticsearch documentstore to reduce inproc mem
* Revert "switch to elasticsearch documentstore to reduce inproc mem"
This reverts commit 7a75871909c3317a252dff3a4df17e99eff69d05.
* get retiever from conftest
* use smaller embedding model for summarizer
* use smaller summarizer model
* remove queries param from pipeline.eval()
* isolate problematic tests
* rename separate test file
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* retriever metrics added
* Add latest docstring and tutorial changes
* answer and document level matching metrics implemented
* Add latest docstring and tutorial changes
* answer related metrics for retriever
* basic reader metrics implemented
* handle no_answers
* fix typing
* fix tests
* fix tests without sas
* first draft for simulated top k
* rename sas and f1 columns in dataframe
* refactoring of EvaluationResult
* Add latest docstring and tutorial changes
* more eval tests added
* fix sas expected value precision
* distinction between ir and qa recall
* EvaluationResult.worst_queries() implemented
* print_evaluation_report() added
* eval report for QA Pipeline improved
* dynamic metrics for worst queries calc
* Add latest docstring and tutorial changes
* method names adjusted
* simple test for print_eval_report() added
* improved documentation
* Add latest docstring and tutorial changes
* minor formatting
* Add latest docstring and tutorial changes
* fix no_answer cases
* adjust one docstring
* Add latest docstring and tutorial changes
* fix no_answer cases for sas
* batchmode for sas implemented
* fix for retriever metrics if there are only no_answers
* fix multilabel tests
* improve documentation for pipeline.eval()
* streamline multilabel aggregates and docs
* Add latest docstring and tutorial changes
* fix multilabel tests
* unify document_id
* add dataframe schema description to EvaluationResult
* Add latest docstring and tutorial changes
* rename worst_queries to wrong_examples
* Add latest docstring and tutorial changes
* make query digesting standard pipelines work with pipeline.eval()
* Add latest docstring and tutorial changes
* tests for multi retriever pipelines added
* remove unnecessary import
* print_eval_report(): support all pipelines without junctions
* Add latest docstring and tutorial changes
* fix typos
* Add latest docstring and tutorial changes
* fix minor simulated_top_k bug and use memory documentstore throughout tests
* sas model param description improved
* Add latest docstring and tutorial changes
* rename recall metrics
* Add latest docstring and tutorial changes
* fix mean average precision link
* Add latest docstring and tutorial changes
* adjust sas description docstring
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* create uuid and dummy embeddding in weaviate doc store
* handle and test for duplicate non-uuid-formatted ids in weaviate
* add uuid and dummy embedding to doc strings
* Add latest docstring and tutorial changes
* Upgrade weaviate
* Include weaviate in common doc store test cases
* Add latest docstring and tutorial changes
* Exclude weaviate doc store from eval tests
* Incorporate index name in uuid generation
* Ignore mypy error
* Fix typo
* Restore DOCS without uuid and embeddings generated by weaviate
* Supply docs for retriever tests as fixture
* Limit scope of fixture to function instead of session
* Add comments
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Files moved, imports all broken
* Fix most imports and docstrings into
* Fix the paths to the modules in the API docs
* Add latest docstring and tutorial changes
* Add a few pipelines that were lost in the inports
* Fix a bunch of mypy warnings
* Add latest docstring and tutorial changes
* Create a file_classifier module
* Add docs for file_classifier
* Fixed most circular imports, now the REST API can start
* Add latest docstring and tutorial changes
* Tackling more mypy issues
* Reintroduce from FARM and fix last mypy issues hopefully
* Re-enable old-style imports
* Fix some more import from the top-level package in an attempt to sort out circular imports
* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests
* Change document_store into document_stores
* Update imports in tutorials
* Add latest docstring and tutorial changes
* Probably fixes summarizer tests
* Improve the old-style import allowing module imports (should work)
* Try to fix the docs
* Remove dedicated KnowledgeGraph page from autodocs
* Remove dedicated GraphRetriever page from autodocs
* Fix generate_docstrings.sh with an updated list of yaml files to look for
* Fix some more modules in the docs
* Fix the document stores docs too
* Fix a small issue on Tutorial14
* Add latest docstring and tutorial changes
* Add deprecation warning to old-style imports
* Remove stray folder and import Dict into dense.py
* Change import path for MLFlowLogger
* Add old loggers path to the import path aliases
* Fix debug output of convert_ipynb.py
* Fix circular import on BaseRetriever
* Missed one merge block
* re-run tutorial 5
* Fix imports in tutorial 5
* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base
* Add latest docstring and tutorial changes
* Fix typo in utils __init__
* Fix a few more imports
* Fix benchmarks too
* New-style imports in test_knowledge_graph
* Rollback setup.py
* Rollback squad_to_dpr too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add node names validation
* Add tests
* Improve test and test that params exists before validating
* Fix the REST API
* Use minilm-uncased-squad2 instead of roberta-base-squad2
* Use roberta model for test_pipeline.yaml
* Turn off TOKENIZERS_PARALLELISM in generator tests (#1605)
* Account for non-targeted parameters
* Restore previous parameters handling in the rest api
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* update readme and contributing.md
* update contributing
* adjust example
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Fix duplicate question in Reader.eval()
* Add duplicate question support in document store
* Support duplicate questions in retriever eval
* Update tutorial
* Rename key_tuple
* Change error message
* Add warning when more than 6 labels
* Allow for label grouping options
* Add support for aggregating by label meta
* Satisfy mypy
* Make label field flexible, add docstrings
* Satisfy mypy
* Fix failing tests
* Adjust docstring
* Fix tutorial
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Adding ranker similar to retriever and reader
* Sort documents according to query-document similarity scores
* Reranking and model training runs for small example
* Added EvalRanker node
* Calculate recall@k in EvalRetriever and EvalRanker nodes
* Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers
* Added mean reciprocal rank as metric for EvalDocuments
* Fix bug that appeared when ranking documents with same score
* Remove commented code for unimplmented eval() of Ranker node
* Add documentation of k parameter in EvalDocuments
* Add Ranker docu and renaming top_k param
* Allow filtering of duplicate answers as implemented in FARM
* Changed default behavior to filtering exact duplicates
* Change expected test result due to filtering of duplicate answers by default
* Rounding expected test results for comparison with predictions
* Make batchwise adding of evaluation data possible
* Fix typos in docstrings
* Merge add_eval_data and add_eval_data_batchwise
* Improve import statements
* Move add_eval_data to BaseDocumentStore
* Add batch_size param to write_documents and write_labels in EsDocStore
* Adjust docstring
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Adding dummy generator implementation
* Adding tutorial to try the model
* Committing current non working code
* Committing current update where we need to call generate function directly and need to convert embedding to tensor way
* Addressing review comments.
* Refactoring finder, and implementing rag_generator class.
* Refined the implementation of RAGGenerator and now it is in clean shape
* Renaming RAGGenerator to RAGenerator
* Reverting change from finder.py and addressing review comments
* Remove support for RagSequenceForGeneration
* Utilizing embed_passage function from DensePassageRetriever
* Adding sample test data to verify generator output
* Updating testing script
* Updating testing script
* Fixing bug related to top_k
* Updating latest farm dependency
* Comment out farm dependency
* Reverting changes from TransformersReader
* Adding transformers dataset to compare transformers and haystack generator implementation
* Using generator_encoder instead of question_encoder to generate context_input_ids
* Adding workaround to install FARM dependency from master branch
* Removing unnecessary changes
* Fixing generator test
* Removing transformers datasets
* Fixing generator test
* Some cleanup and updating TODO comments
* Adding tutorial notebook
* Updating tutorials with comments
* Explicitly passing token model in RAG test
* Addressing review comments
* Fixing notebook
* Refactoring tests to reduce memory footprint
* Split generator tests in separate ci step and before running it reclaim memory by terminating containers
* Moving tika dependent test to separate dir
* Remove unwanted code
* Brining reader under session scope
* Farm is now session object hence restoring changes from default value
* Updating assert for pdf converter
* Dummy commit to trigger CI flow
* REducing memory footprint required for generator tests
* Fixing mypy issues
* Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits
* reducing changes
* Fixing CI
* changing elastic search ci
* Fixing test error
* Disabling return of embedding
* Marking generator test as well
* Refactoring tutorials
* Increasing ES memory to 750M
* Trying another fix for ES CI
* Reverting CI changes
* Splitting tests in CI
* Generator and non-generator markers split
* Adding pytest.ini to add markers and enable strict-markers option
* Reducing elastic search container memory
* Simplifying generator test by using documents with embedding directly
* Bump up farm to 0.5.0
* 1. Prevent update_embeddings function in FAISSDocumentStore to set faiss_index as None when document store does not have any docs.
2. cleaning up tests by adding fixture for retriever.
* TfidfRetriever need document store with documents during initialization as it call fit() function in constructor so fixing it by checking self.paragraphs of None
* Fix naming of retriever's fixture (embedded to embedding and tfid to tfidf)