* retriever metrics added
* Add latest docstring and tutorial changes
* answer and document level matching metrics implemented
* Add latest docstring and tutorial changes
* answer related metrics for retriever
* basic reader metrics implemented
* handle no_answers
* fix typing
* fix tests
* fix tests without sas
* first draft for simulated top k
* rename sas and f1 columns in dataframe
* refactoring of EvaluationResult
* Add latest docstring and tutorial changes
* more eval tests added
* fix sas expected value precision
* distinction between ir and qa recall
* EvaluationResult.worst_queries() implemented
* print_evaluation_report() added
* eval report for QA Pipeline improved
* dynamic metrics for worst queries calc
* Add latest docstring and tutorial changes
* method names adjusted
* simple test for print_eval_report() added
* improved documentation
* Add latest docstring and tutorial changes
* minor formatting
* Add latest docstring and tutorial changes
* fix no_answer cases
* adjust one docstring
* Add latest docstring and tutorial changes
* fix no_answer cases for sas
* batchmode for sas implemented
* fix for retriever metrics if there are only no_answers
* fix multilabel tests
* improve documentation for pipeline.eval()
* streamline multilabel aggregates and docs
* Add latest docstring and tutorial changes
* fix multilabel tests
* unify document_id
* add dataframe schema description to EvaluationResult
* Add latest docstring and tutorial changes
* rename worst_queries to wrong_examples
* Add latest docstring and tutorial changes
* make query digesting standard pipelines work with pipeline.eval()
* Add latest docstring and tutorial changes
* tests for multi retriever pipelines added
* remove unnecessary import
* print_eval_report(): support all pipelines without junctions
* Add latest docstring and tutorial changes
* fix typos
* Add latest docstring and tutorial changes
* fix minor simulated_top_k bug and use memory documentstore throughout tests
* sas model param description improved
* Add latest docstring and tutorial changes
* rename recall metrics
* Add latest docstring and tutorial changes
* fix mean average precision link
* Add latest docstring and tutorial changes
* adjust sas description docstring
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Introduced an arg add synonyms to Elasticsearch
* Added the test code, removed the whitespace formatting changes, and overwrote the relevant parts from the already existing mapping instead of creating new mapping.
* Added the test code
* Remove whitespace change
* Added the doc_string with examples and link
* Removed unneccessary spaces
* Add latest docstring and tutorial changes
* fix text_field -> content_field
Co-authored-by: sowmiya-emplay <sowmiya.j@emplay.net>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* adding yaml functionality to BaseStandardPipeline
fixes#1681
* Add latest docstring and tutorial changes
* Update API Reference Pages for v1.0 (#1729)
* Create new API pages and update existing ones
* Create query classifier page
* Remove Objects suffix
* Change answer aggregation key to doc_id, query instead of label_id, query (#1726)
* Add debugging example to tutorial (#1731)
* Add debugging example to tutorial
* Add latest docstring and tutorial changes
* Remove Objects suffix
* Add latest docstring and tutorial changes
* Revert "Remove Objects suffix"
This reverts commit 6681cb06510b080775994effe6a50bae42254be4.
* Revert unintentional commit
* Add third debugging option
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix another self.device/s typo (#1734)
* Fix yet another self.device(s) typo
* Add typing to 'initialize_device_settings' to try prevent future issues
* Fix bug in Tutorial5
* Fix the same bug in the notebook
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* added test for saving and loading prebuilt pipelines
* fixed typo, changed variable name and added comments
* Add latest docstring and tutorial changes
* Fix a few details of some tutorials (#1733)
* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15
* Add a type check in 'print_answers'
* Add same checks to print_documents and print_questions
* Make RAGenerator return Answers instead of dictionaries
* Fix RAGenerator tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix `print_answers` (#1743)
* Fix a specific path of print_answers that was assuming answers are dictionaries
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Split pipeline tests into three suites (#1755)
* Split pipeline tests into three suites
* Will this trigger the CI?
* Rename duplicate test into test_most_similar_documents_pipeline
* Fixing a bug that was probably never noticed
* Capitalize starting letter in params (#1750)
* Capitalize starting letter in params
Capitalized the starting letter in code examples for params in keeping with the latest names for nodes where first letter is capitalized.
Refer: https://github.com/deepset-ai/haystack/issues/1748
* Update standard_pipelines.py
Capitalized some starting letters in the docstrings in keeping with the updated node names for standard pipelines
* Multi query eval (#1746)
* add eval() to pipeline
* Add latest docstring and tutorial changes
* support multiple queries in eval()
* Add latest docstring and tutorial changes
* keep single query test
* fix EvaluationResult node_results default
* adjust docstrings
* Add latest docstring and tutorial changes
* minor improvements from comments
* Add latest docstring and tutorial changes
* move EvaluationResult and calculate_metrics to schema
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Split summarizer tests in order to make windows CI work again (#1757)
* separate testfile for summarizer with translation
* Add latest docstring and tutorial changes
* import SPLIT_DOCS from test_summarizer
* add workflow_dispatch to windows_ci
* add worflow_dispatch to linux_ci
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix import of EvaluationResult in test case
* exclude test_summarizer_translation.py for windows_ci (#1759)
* Pipelines now tolerate custom _debug content (#1756)
* Pipelines now tolerate custom _debug content
* Support Tables in all DocumentStores (#1744)
* Add support for tables in SQLDocumentStore, FAISSDocumentStore and MilvuDocumentStore
* Add support for WeaviateDocumentStore
* Make sure that embedded meta fields are strings + add embedding_dim to WeaviateDocStore in test config
* Add latest docstring and tutorial changes
* Represent tables in WeaviateDocumentStore as nested lists
* Fix mypy
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Allow TableReader models without aggregation classifier (#1772)
* Fix usage of filters in `/query` endpoint in REST API (#1774)
* WIP filter refactoring
* fix filter formatting
* remove inplace modification of filters
* Public demo (#1747)
* Queries now run only when pressing RUN. File upload hidden. Question is not sent if the textbox is empty.
* Add latest docstring and tutorial changes
* Tidy up: remove needless state, add comments, fix minor bugs
* Had to add results to the status to avoid some bugs in eval mode
* Added 'credits'
* Add footers, update requirements, some random questions for the evaluation
* Add requested changes
* Temporary rollback the UI to the old GoT dataset
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Facilitate concurrent query / indexing in Elasticsearch with dense retrievers (new `skip_missing_embeddings` param) (#1762)
* Filtering records not having embeddings
* Added support for skip_missing_embeddings Flag. Default behavior is throw error when embeddings are missing. If skip_missing_embeddings=True then documents without embeddings are ignored for vector similarity
* Fix for below error:
haystack/document_stores/elasticsearch.py:852: error: Need type annotation for "script_score_query"
* docstring for skip_missing_embeddings parameter
* Raise exception where no documents with embeddings is found for Embedding retriever.
* Default skip_missing_embeddings to True
* Explicitly check if embeddings are present if no results are returned by EmbeddingRetriever for Elasticsearch
* Added test case for based on Julian's input
* Added test case for based on Julian's input. Fix pytest error on the testcase
* Added test case for based on Julian's input. Fix pytest error on the testcase
* Added test case for based on Julian's input. Fix pytest error on the testcase
* Simplify code by using get_embed_count
* Adjust docstring & error msg slightly
* Revert error msg
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Huggingface private model support via API tokens (FARMReader) (#1775)
* passed kwargs to model loading
* Pass Auth token explicitly
* add use_auth_token to get_language_model_class
* added use_auth_token parameter at FARMReader
* Add latest docstring and tutorial changes
* added docs for parameter `use_auth_token`
* Add latest docstring and tutorial changes
* adding docs link
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* private hugging face models for retrievers (#1785)
* private dpr
* Add latest docstring and tutorial changes
* added parameters to child functions
* Add latest docstring and tutorial changes
* added tableextractor
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* ignore empty filters parameter (#1783)
* ignore empty filters parameter
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* initialize doc store with doc and label index in tutorial 5 (#1730)
* initialize doc store with doc and label index
* change ipynb according to py for tutorial 5
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Small fixes to the public demo (#1781)
* Make strealit tolerant to haystack not knowing its version, and adding special error for docstore issues
* Add workaround for a Streamlit bug
* Make default filters value an empty dict
* Return more context for each answer in the rest api
* Make the hs_version call not-blocking by adding a very quick timeout
* Add disclaimer on low confidence answer
* Use the no-answer feature of the reader to highlight questions with no good answer
* Upgrade torch to v1.10.0 (#1789)
* Upgrade torch to v1.10.0
* Adapt torch version for torch-scatter in TableQA tutorial
* Add latest docstring and tutorial changes
* Make torch version more flexible
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* adding yaml functionality to BaseStandardPipeline
fixes#1681
* Add latest docstring and tutorial changes
* added test for saving and loading prebuilt pipelines
* fixed typo, changed variable name and added comments
* Add latest docstring and tutorial changes
* fix code rendering for example
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Branden Chan <33759007+brandenchan@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
Co-authored-by: nishanthcgit <5066268+nishanthcgit@users.noreply.github.com>
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: C V Goudar <cvgoudar@users.noreply.github.com>
Co-authored-by: Kristof Herrmann <37148029+ArzelaAscoIi@users.noreply.github.com>
* separate testfile for summarizer with translation
* Add latest docstring and tutorial changes
* import SPLIT_DOCS from test_summarizer
* add workflow_dispatch to windows_ci
* add worflow_dispatch to linux_ci
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix a specific path of print_answers that was assuming answers are dictionaries
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Move each haystack module's logger configuration into the respective file and configure the handlers properly
* Implement most changes from #1714
* Remove accidentally committed git merge tags ':D
* Remove the debug logs capture feature
* Remove more references to debug_logs
* Fix issue with FARMReader that somehow made it to master
* Add devices parameter to Inferencer
* Change log of APEX message to DEBUG and lower the 'Starting <docstore>...' messages to DEBUG as well
* Change log level of a few logs from modeling
* Silence the transformers warning
* Remove empty line below the workers :)
* Fix two more levels in the tutorials logs
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Use initialize_device_settings in all nodes
* Set StreamHandler level to INFO
* Add latest docstring and tutorial changes
* work in progress
* Standardize device initialization
* Add latest docstring and tutorial changes
* Adapt device initialization in Reader's train method
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* create uuid and dummy embeddding in weaviate doc store
* handle and test for duplicate non-uuid-formatted ids in weaviate
* add uuid and dummy embedding to doc strings
* Add latest docstring and tutorial changes
* Upgrade weaviate
* Include weaviate in common doc store test cases
* Add latest docstring and tutorial changes
* Exclude weaviate doc store from eval tests
* Incorporate index name in uuid generation
* Ignore mypy error
* Fix typo
* Restore DOCS without uuid and embeddings generated by weaviate
* Supply docs for retriever tests as fixture
* Limit scope of fixture to function instead of session
* Add comments
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Added uniform normalization method to each of the DocStores (implemented), so that now Milvus and Weaviate doc stores can use cosine similarity, plus future method for making existing embeddings normaziled (empty for now).
* Fixed a typo.
* Fixed lots of stuff. Performed local tests.
* Fixed scores representation for cosine. Assuming Weavieate's rep needs no change.
* fixes as per discussion
* Trigger CI
* resolving conflicts
* small typo
* fixed a param type
* cleaned some conflicts resolving left overs
* commented out fastmath for a moment
* fixing tests
* added docstore for small vectors
* test
* fixed document_store_cosine_small
* cosine tests fixes
* fixed document_store_cosine_small
* fixed weaviate index name and lowered rtol for ES
* increased rtol
* added explicit doc_ids for weaviate, excluded ES, included Inmemory
* resolving mismatch
* fixing a typo
* flatten normalize_embedding()
* fix import for test
* standardize normalize_embeddings across doc stores
* Add latest docstring and tutorial changes
* going for the faster plain dot prod
Co-authored-by: fingoldo <fingoldo@gmail.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* ensure tf-idf matrix calculation before retrieval
* Run fit() automatically if new documents have been added
* Add latest docstring and tutorial changes
* Fix type error
* Add test case for tfidf retriever yaml pipeline
* Use InMemoryDocStore and add 2nd test case
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Truncate too large tables for TableReader
* Add documentation
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Files moved, imports all broken
* Fix most imports and docstrings into
* Fix the paths to the modules in the API docs
* Add latest docstring and tutorial changes
* Add a few pipelines that were lost in the inports
* Fix a bunch of mypy warnings
* Add latest docstring and tutorial changes
* Create a file_classifier module
* Add docs for file_classifier
* Fixed most circular imports, now the REST API can start
* Add latest docstring and tutorial changes
* Tackling more mypy issues
* Reintroduce from FARM and fix last mypy issues hopefully
* Re-enable old-style imports
* Fix some more import from the top-level package in an attempt to sort out circular imports
* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests
* Change document_store into document_stores
* Update imports in tutorials
* Add latest docstring and tutorial changes
* Probably fixes summarizer tests
* Improve the old-style import allowing module imports (should work)
* Try to fix the docs
* Remove dedicated KnowledgeGraph page from autodocs
* Remove dedicated GraphRetriever page from autodocs
* Fix generate_docstrings.sh with an updated list of yaml files to look for
* Fix some more modules in the docs
* Fix the document stores docs too
* Fix a small issue on Tutorial14
* Add latest docstring and tutorial changes
* Add deprecation warning to old-style imports
* Remove stray folder and import Dict into dense.py
* Change import path for MLFlowLogger
* Add old loggers path to the import path aliases
* Fix debug output of convert_ipynb.py
* Fix circular import on BaseRetriever
* Missed one merge block
* re-run tutorial 5
* Fix imports in tutorial 5
* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base
* Add latest docstring and tutorial changes
* Fix typo in utils __init__
* Fix a few more imports
* Fix benchmarks too
* New-style imports in test_knowledge_graph
* Rollback setup.py
* Rollback squad_to_dpr too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add delete_labels() except for weaviate doc store
* Add latest docstring and tutorial changes
* Add test for delete_labels()
* Adapt filter for label deletion to different doc stores in test
* Allow delete labels by _id in elasticsearch
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* re-add bugfix after merge
* Add ids as optional parameter
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Rename TransformersAdamW into simply AdamW (probably changed in transformers at some point)
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add endpoint to get documents by filter
* Add test for /documents/get_by_filter and extend the delete documents test
* Add rest_api/file-upload to .gitignore
* Make sure the document store is empty for each test
* Improve docstrings of delete_documents_by_filters and get_documents_by_filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* First rough implementation
* Add a flag to dump the debug logs to the console as well
* Typing run() and _dispatch_run()
* Allow debug and debug_logs to be passed as arguments of run()
* Avoid overwriting _debug, later we might want to store other objects in it
* Put logs under a separate key of the _debug dictionary and add input and output of the node alongside it
* Introduce global arguments for pipeline.run() that get applied to every node when defined
* Change default values of debug variables to None, otherwise their default would override the params values
* Remove a potential infinite recursion on the overridden __getattr__
* Do not append the output of the last node in the _debug key, it causes infinite recursion
* Add tests
* Move the input/output collection into _dispatch_run to gather only relevant info
* Add partial Pipeline.run() docstring
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Initial draft of TransformersClassifier
* Add transformers classifier implementation
* Add test for SentenceTransformersClassifier
* Add truncation and corresponding test case to Classifier
* Add zero-shot classification and test
* Add document classifier documentation
* Add latest docstring and tutorial changes
* print meta data with print_documents()
* Add latest docstring and tutorial changes
* Remove top_k param from Classifier usage example
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make InMemoryDocumentStore accept and apply filters in delete_documents()
* Modify test_document_store.py to test the filtered deletion in memory, sql and milvus too
* Make FAISSDocumentStore accept and properly apply filters in delete_documents()
* Add latest docstring and tutorial changes
* Remove accidentally duplicated test
* Remove unnecessary decorators from test/test_document_store.py::test_delete_documents_with_filters
* Add embeddings count test for FAISS and Milvus; Milvus fails it.
* Fixed a bug that made Milvus not deleting embeddings
* Remove batch size parametrization in tests & update all documentstore's docstrings with a filter example
* Add latest docstring and tutorial changes
Co-authored-by: prafgup <prafulgupta6@gmail.com>
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* update readme and contributing.md
* update contributing
* adjust example
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* remove not needed githab actions and reactivate docstrings and tutorial generation
* test workflow
* update pydoc version
* update python version
* update watchdog
* move to latest version pydoc-markdown
* remove version check
* Add latest docstring and tutorial changes
* remove test workflow
* test for param docstrings
* pin pydoc-markdown version
* add test workflow
* pin watchdog version
* Add latest docstring and tutorial changes
* update original workflow and delete test
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>