* [document_stores]Add the progressbar in update_embeddings() to track the overall documents progress closed#1037
* change 2nd level loop to docs. switch to tqdm.auto.
* [document_stores] Elasticsearch new method get_document_without_embedding_count() added.
* [test_case] Elasticsearch documentstore get_document_without_embedding_count() test case added.
* [document_stores] Add new bool arg in get_document_count() method and fixed#1082
* [document_stores] typo fixed#1082
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* using text hash as id to prevent document duplication. Also providing a way customize it.
* Add latest docstring and tutorial changes
* Fixing duplicate value test when text is same
* Adding test for duplicate ids in document store
* Changing exception to generic Exception type
* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Allow filtering of duplicate answers as implemented in FARM
* Changed default behavior to filtering exact duplicates
* Change expected test result due to filtering of duplicate answers by default
* Rounding expected test results for comparison with predictions
* Create knowledge_graph.md
* add doc strings to Text2SparqlRetriever
* Add doc strings to GraphDBKnowledgeGraph
* Make method calls unambiguous so its clear which class is meant
* first running version of eval mode
* restructuring, new naming of elements and testing
* add new files to Docker, how to start with Haystack reference, remove not needed dependencies
* Add latest docstring and tutorial changes
* merged changes
* fixing bugs after breaking changes from last release
* newser version of states in streamlit, more docs for eval mode, eval file as env virable
* eval file as env variable
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add knowledge graph module
* Fix type hint
* Add graph retriver module
* Change type annotations, change return format
* Add graph retriever that executes questions as sparql queries
* Linking only those entities that are in the knowledge graph
* Added logging and using relations extracted from Knowledge graph for linking
* Preventing entity linking from linking the same token to multiple entities
* Pruning triples that have no variables for select and count queries
* Support knowledge graphs with Pipelines
* Add text2sparql
* Entity linking and relation linking consider more special cases now based on evaluation on labelled data
* Separating example code from KGQA implementation
* Add eval on combined extarctive and kg questions
* Remove references to hp-test
* Add fields sparql_query and long_answer_list to metadata
* Removing modular Question2SPARQL approach
* Removing additional classes used for modular kgqa approach
* preparing lcquad data
* change graph db
* Translating namespaces in knowledge graph queries
* Creating graphdb index and loading triples from .ttl file
* Fetching graph config files, triples and model from S3
* Fix incompatibility issues with BaseGraphRetriever and BaseComponent
* Removing unused utility functions
* Adding doc strings and tutorial header
* Adding sparqlwrapper dependency
* Moving tutorial header
* Sorting tutorials by number within name of notebook
* Add latest docstring and tutorial changes
* Creating test cases for knowledge graph
* Changing knowledge graph example to harry potter
* Add latest docstring and tutorial changes
* Adapting the tutorial notebook to harry potter example
* Add GraphDB fixture for tests
* Add latest docstring and tutorial changes
* Added GraphDB docker launch to CI
* Use correct GraphDB fixture
* Check if GraphDB instance is already running
* Renaming question/query and incorporating other feedback from Timo and Tanay
* Removed type annotation
* Add latest docstring and tutorial changes
Co-authored-by: oryx1729 <oryx1729@protonmail.com>
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
A proposal to reduce the precision shown in the `EvalRetriever.print` and `EvalReader.print` to 4 significant figures. If the user wants the full precision, they can access the class attributes directly.
Before
```
Retriever
-----------------
has_answer recall: 0.8739495798319328 (208/238)
no_answer recall: 1.00 (120/120) (no_answer samples are always treated as correctly retrieved)
recall: 0.9162011173184358 (328 / 358)
```
After
```
Retriever
-----------------
has_answer recall: 0.8739 (208/238)
no_answer recall: 1.00 (120/120) (no_answer samples are always treated as correctly retrieved)
recall: 0.9162 (328 / 358)
```
If the first query in the evaluation returns a document with `no_answer=True` we got a division by zero error because neither `self.has_answer_correct` or `self.has_answer_count` get incremented. This fix moves the `self.has_answer_recall` calculation within the if-else block.