* Add docstrings to the REST API endpoint to have them included in the OpenAPI specs
* Attempt at make GitHub CI generate the OpenAPI specs
* Missing __init__.py was breaking rest_api import
* Add comment on dummy pipeline
* Create separate workflow file for the OpenAPI specs generation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
* Remove stray requirements.txt files and update README.md
* Remove requirement files
* Add details about pip bug and link to setup.cfg
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fist attempt at using setup.cfg for dependency management
* Trying the new package on the CI and in Docker too
* Add composite extras_require
* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports
* Fix bug on class import and rephrase error message
* Introduce typing for optional modules and add type: ignore in sparse.py
* Include importlib_metadata backport for py3.7
* Add colab group to extra_requires
* Fix pillow version
* Fix grpcio
* Separate out the crawler as another extra
* Make paths relative in rest_api and ui
* Update the test matrix in the CI
* Add try catch statements around the optional imports too to account for direct imports
* Never mix direct deps with self-references and add ES deps to the base install
* Refactor several paths in tests to make them insensitive to the execution path
* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix
* Wrap pdf conversion utils into safe_import
* Update some tutorials and rever Milvus1 as default for now, see #2067
* Fix mypy config
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix data augmentation path in finetuning notebook
* Add latest docstring and tutorial changes
* make distillation possible with other models than BERT
* use smaller dataset for distillation in finetuning tutorial
* Add latest docstring and tutorial changes
* make data augmentation in finetuning faster
* update language models forward doc strings
* fix return type of language models
* remove debug output
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* doc store should return all documents matching ids passed to get_documents_by_id
* test for get_document_by_id should be named correctly
* add test for get_documents_by_id
* Add latest docstring and tutorial changes
* document es query limit
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* minimal DCDocumentStore
* support filters
* implement get_documents_by_id
* handle not existing documents
* add docstrings
* auth added
* add tests
* generate docs
* Add latest docstring and tutorial changes
* add responses to dev dependencies
* fix tests
* support query() and quey_by_embedding()
* Add latest docstring and tutorial changes
* query tests added
* read api_key and api_endpoint from env
* Add latest docstring and tutorial changes
* support query() and quey_by_embedding()
* query tests added
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* support dynamic similarity and return_embedding values
* Add latest docstring and tutorial changes
* adjust KeywordDocumentStore description
* refactoring
* Add latest docstring and tutorial changes
* implement get_document_count and raise on all not implemented methods
* Add latest docstring and tutorial changes
* don't use abbreviation DC in comments and errors
* Add latest docstring and tutorial changes
* docstring added to KeywordDocumentStore
* Add latest docstring and tutorial changes
* enhanced api key set
* split tests into two parts
* change setup.py in order to work around build cache
* added link
* Add latest docstring and tutorial changes
* rename DCDocumentStore to DeepsetCloudDocumentStore
* Add latest docstring and tutorial changes
* remove dc.py
* reinsert link to docs
* fix imports
* Add latest docstring and tutorial changes
* better test structure
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
* add ndcg and eval_mode to docstrings and reorder dataframe columns in docs
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation
* Add latest docstring and tutorial changes
* improve default parameters
* Add latest docstring and tutorial changes
* split up distillation method
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* run predictions on ground-truth docs in reader
* build dataframe for closed/open domain eval
* fix looping through multilabel
* fix looping through multilabel's list of labels
* simplify collecting relevant docs
* switch closed-domain eval off by default
* Add latest docstring and tutorial changes
* handle edge case params not given
* renaming & generate pipeline eval report
* add test case for closed-domain eval metrics
* Add latest docstring and tutorial changes
* test report of closed-domain eval
* report closed-domain metrics only for answer metrics not doc metrics
* refactoring
* fix mypy & remove comment
* add second for-loop & use answer as method input
* renaming & add separate loop building docs eval df
* Add latest docstring and tutorial changes
* source /home/tstad/miniconda3/bin/activatechange column order for evaluatation dataframe (#1957)
conda activate haystack-dev2
* change column order for evaluatation dataframe
* added missing eval column node_input
* generic order for both document and answer returning nodes; ensure no columns get lost
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* fix column reordering after renaming of node_input
* simplify tests & add docu
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ju-gu <87523290+ju-gu@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: Thomas Stadelmann <thomas.stadelmann@deepset.ai>
* Properly fix MetaDocumentORM and MetaLabelORM with composite foreign key constraints
* update_document_meta() was not using index properly
* Exclude ES and Memory from the cosine_sanity_check test
* move ensure_ids_are_correct_uuids in conftest and move one test back to faiss & milvus suite
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* align document store similarity functions
* remove unnecessary imports
* undone accidental change
* stopped weaviate from pretending to support dot product similarity
* stopped weaviate from pretending to support dot product similarity
* Add latest docstring and tutorial changes
* fix fixture params for document stores
* use cosine similarity for most tests
* fix cosine similarity test
* fix faiss test
* fix weaviate test
* fix accidental deletion
* fix document_store fixture
* test fix; shouldn't be merged
* fix test_normalize_embeddings_diff_shapes
* probably a better fix
* fix for parameter combinations
* revert new pytest_generate_tests functionality
* simplify pytest_generate_tests
* normalize embeddings for test_dpr_embedding
* add to faiss doc that embeddings are normalized
* Add latest docstring and tutorial changes
* remove unnecessary parameters and add comments
* simplify two lines of memory.py into one
* test similarity scores with smaller language model
* fix test_similarity_score
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores
* Unit test cases updated to use `embedding_dim` instead of `vector_dim`
* Unit test case update to use embedding_dim instead of vector_dim
* Add latest docstring and tutorial changes
* Put usage of `vector_dim` param in same if-block as corresponding warning
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* check multiprocessing sharing strategy is available
* Change default of multiprocessing strategy to None
* Change default sharing strategy to None in retriever
* Add latest docstring and tutorial changes
* Make logging message easier to understand
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix#1687
* fix RuntimeError: received 0 items of ancdata
* Add an arg multiprocessing_strategy to DataSilo and DPR.train()
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add ParsrConverter
* Fix typing error + add Parsr to Linux CI
* Fix valid_language for all converters + fix context generation for ParsrConverter
* Remove ParsrConverter test from WindowsCI
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Replace old tutorial 5 with new code based on test cases
* Add latest docstring and tutorial changes
* Use pipeline.eval() in tutorial
* Add latest docstring and tutorial changes
* Restructure notebook
* Add latest docstring and tutorial changes
* Add dataframe example
* Add latest docstring and tutorial changes
* Get eval data from doc store
* Add latest docstring and tutorial changes
* Load data from doc store
* Add latest docstring and tutorial changes
* Clear outputs
* Add latest docstring and tutorial changes
* Change example and add python script
* Add latest docstring and tutorial changes
* Fetch aggregated multilabels from doc store
* Add latest docstring and tutorial changes
* Incorporate review feedback on text comments
* Add latest docstring and tutorial changes
* Add Notebook output
* Remove queries param from pipeline.eval()
* Add latest docstring and tutorial changes
* Add output with all metrics
* Add printing of multiple metrics to script
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* retriever metrics added
* Add latest docstring and tutorial changes
* answer and document level matching metrics implemented
* Add latest docstring and tutorial changes
* answer related metrics for retriever
* basic reader metrics implemented
* handle no_answers
* fix typing
* fix tests
* fix tests without sas
* first draft for simulated top k
* rename sas and f1 columns in dataframe
* refactoring of EvaluationResult
* Add latest docstring and tutorial changes
* more eval tests added
* fix sas expected value precision
* distinction between ir and qa recall
* EvaluationResult.worst_queries() implemented
* print_evaluation_report() added
* eval report for QA Pipeline improved
* dynamic metrics for worst queries calc
* Add latest docstring and tutorial changes
* method names adjusted
* simple test for print_eval_report() added
* improved documentation
* Add latest docstring and tutorial changes
* minor formatting
* Add latest docstring and tutorial changes
* fix no_answer cases
* adjust one docstring
* Add latest docstring and tutorial changes
* fix no_answer cases for sas
* batchmode for sas implemented
* fix for retriever metrics if there are only no_answers
* fix multilabel tests
* improve documentation for pipeline.eval()
* streamline multilabel aggregates and docs
* Add latest docstring and tutorial changes
* fix multilabel tests
* unify document_id
* add dataframe schema description to EvaluationResult
* Add latest docstring and tutorial changes
* rename worst_queries to wrong_examples
* Add latest docstring and tutorial changes
* make query digesting standard pipelines work with pipeline.eval()
* Add latest docstring and tutorial changes
* tests for multi retriever pipelines added
* remove unnecessary import
* print_eval_report(): support all pipelines without junctions
* Add latest docstring and tutorial changes
* fix typos
* Add latest docstring and tutorial changes
* fix minor simulated_top_k bug and use memory documentstore throughout tests
* sas model param description improved
* Add latest docstring and tutorial changes
* rename recall metrics
* Add latest docstring and tutorial changes
* fix mean average precision link
* Add latest docstring and tutorial changes
* adjust sas description docstring
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Add FormRecognizerConverter
* Change signature of convert method + change return type of all converters
* Adapt preprocessing util to new return type of converters
* Parametrize number of lines used for surrounding context of table
* Change name from FormRecognizerConverter to AzureConverter
* Set version of azure-ai-formrecognizer package
* Change tutorial 8 based on new return type of converters
* Add tests
* Add latest docstring and tutorial changes
* Fix typo
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Fix link to colab notebook in tutorial 16
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Introduced an arg add synonyms to Elasticsearch
* Added the test code, removed the whitespace formatting changes, and overwrote the relevant parts from the already existing mapping instead of creating new mapping.
* Added the test code
* Remove whitespace change
* Added the doc_string with examples and link
* Removed unneccessary spaces
* Add latest docstring and tutorial changes
* fix text_field -> content_field
Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>