976 Commits

Author SHA1 Message Date
Ikram Ali
4ab1bc3c3e
Improve the progress bar in update_embeddings() + Fix filters in update_embeddings() (#1063)
* [document_stores]Add the progressbar in update_embeddings() to track the overall documents progress closed #1037

* change 2nd level loop to docs. switch to tqdm.auto.

* [document_stores] Elasticsearch new method get_document_without_embedding_count() added.

* [test_case]  Elasticsearch documentstore get_document_without_embedding_count() test case added.

* [document_stores] Add new bool arg in get_document_count() method and fixed #1082

* [document_stores] typo fixed #1082

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-21 14:18:07 +02:00
Lalit Pagaria
f46b09c756
Using text hash as id to prevent document duplication (#1000)
* using text hash as id to prevent document duplication. Also providing a way customize it.

* Add latest docstring and tutorial changes

* Fixing duplicate value test when text is same

* Adding test for duplicate ids in document store

* Changing exception to generic Exception type

* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-17 17:51:52 +02:00
Malte Pietsch
25d1122773
Upgrade milvus to 1.1.0 (#1066)
* upgrade milvus in CI to 1.1

* fix pymilvus

* loose pymilvus requirement again

* add date to cache keys

* fix date var in action
2021-05-17 17:27:34 +02:00
Moshe Berchansky
880edd139d
Add use_amp to DPR's train method to enable mixed precision training. (#1048) 2021-05-17 15:10:02 +02:00
Ikram Ali
a06e4450d1
Rename delete_all_documents() method to delete_documents() (#1047) 2021-05-10 13:37:08 +02:00
Branden Chan
5d31e633ce
Squad tools (#1029)
* Add first commit

* Add support for conversion to and from pandas df

* Add logging

* Add functionality

* Satisfy mypy

* Incorporate reviewer feedback
2021-05-06 19:02:15 +02:00
Branden Chan
373fef8d1e
Add white space normalization warning (#1022)
* Add white space normalization warning

* Implement safer document id fetching
2021-05-05 17:54:32 +02:00
Branden Chan
aadd8b049a
Add Tutorial 11 to Readme 2021-05-05 15:35:21 +02:00
oryx1729
9bec8859f2
Test ES connection only for the default user (#1028) 2021-05-04 15:03:19 +02:00
oryx1729
c41101ff74
Upgrade streamlit version (#1024) 2021-05-03 17:44:57 +02:00
Julian Risch
bf4563e5d2
Filtering duplicate answers (#1021)
* Allow filtering of duplicate answers as implemented in FARM

* Changed default behavior to filtering exact duplicates

* Change expected test result due to filtering of duplicate answers by default

* Rounding expected test results for comparison with predictions
2021-05-03 17:18:10 +02:00
Bhadresh Savani
ca63f9fee2
Fix debug message for file-upload in UI (#1018) 2021-05-03 09:18:55 +02:00
brandenchan
5b0b3e4616 Merge branch 'master' of https://github.com/deepset-ai/haystack 2021-04-30 16:41:05 +02:00
brandenchan
4cc853d1c3 Update link 2021-04-30 15:06:45 +02:00
Branden Chan
869b493b61
Regen api docs (#1015) 2021-04-30 12:35:13 +02:00
oryx1729
99990e7249
Add export of Pipeline YAML config (#1003) 2021-04-30 12:23:29 +02:00
Mario Jäckle
a00703256f
docs(document_store): add usage information for aws elastic search (#1008)
Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>
2021-04-30 11:38:25 +02:00
Bhadresh Savani
37a72d2f45
Add File Upload Functionality in UI (#995) 2021-04-30 10:46:30 +02:00
Branden Chan
056be3354b
Add pipelines tutorial (#1013) 2021-04-29 18:19:20 +02:00
Branden Chan
9827b3652e
Pipelines tutorial (#991)
* Start Pipelines tutorial

* Make Tutorial 11 run locally

* Add colab compatibility

* Fix pip install

* Add ES install from source

* Add ES install from source

* Add pygraphviz installation

* Incorporate reviewer feedback

* Ensure print_answers() works for Generator output

* Fix typo
2021-04-29 17:31:28 +02:00
Julian Risch
65f1da00cc
knowledge graph documentation (#979)
* Create knowledge_graph.md

* add doc strings to Text2SparqlRetriever

* Add doc strings to GraphDBKnowledgeGraph

* Make method calls unambiguous so its clear which class is meant
2021-04-27 16:44:40 +02:00
oryx1729
8a57f6b16a
Update tests for FAISSDocumentStore (#999) 2021-04-27 09:55:31 +02:00
Markus Paff
cf8a622e35
Streamlit UI Evaluation mode (#920)
* first running version of eval mode

* restructuring, new naming of elements and testing

* add new files to Docker, how to start with Haystack reference, remove not needed dependencies

* Add latest docstring and tutorial changes

* merged changes

* fixing bugs after breaking changes from last release

* newser version of states in streamlit, more docs for eval mode, eval file as env virable

* eval file as env variable

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 17:30:17 +02:00
Branden Chan
9626c0d65e
Update Documentation (#976)
* Add api pages

* Add latest docstring and tutorial changes

* First sweep of usage docs

* Add link to conversion script

* Add import statements

* Add summarization page

* Add web crawler documentation

* Add confidence scores usage

* Add crawler api docs

* Regenerate api docs

* Update summarizer and translator api

* Add api pages

* Add latest docstring and tutorial changes

* First sweep of usage docs

* Add link to conversion script

* Add import statements

* Add summarization page

* Add web crawler documentation

* Add confidence scores usage

* Add crawler api docs

* Regenerate api docs

* Update summarizer and translator api

* Add indentation (pydoc-markdown 3.10.1)

* Comment out metadata

* Remove Finder deprecation message

* Remove Finder in FAQ

* Update tutorial link

* Incorporate reviewer feedback

* Regen api docs

* Add type annotations

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 16:45:29 +02:00
Malte Pietsch
b1e8ebf81a
Create pull_request_template.md 2021-04-22 15:48:39 +02:00
Andrey A
58ea0a62e0
Add links to GitHub Discussion and SO (#984)
* Add link to Stack Overflow

* Add link to GitHub discussions and re-arrange links
2021-04-22 09:51:21 +02:00
Timo Moeller
2e39361f8a
Add maxsamples and convert data dir to path (#989) 2021-04-22 09:35:11 +02:00
oryx1729
7269530e45
Add validation for root node in Pipeline (#987) 2021-04-21 12:18:33 +02:00
oryx1729
8c1e411380
Fix update_embeddings() for FAISSDocumentStore (#978) 2021-04-21 09:56:35 +02:00
Guillim
0051a34ff9
Add root_path option to REST API for reverse proxy deployments (#982) 2021-04-20 11:19:28 +02:00
oryx1729
4dd5a7a744
Make FAISS import optional (#971) 2021-04-15 12:26:34 +02:00
oryx1729
237172f459
Make FAISS import conditional (#970) 2021-04-14 17:34:01 +02:00
Mario Jäckle
84f90e82c5
feature(aws): add aws iam auth method (#965)
Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>
2021-04-14 16:34:24 +02:00
oryx1729
5bb66940a9
Fix equality check in preprocessor (#969) 2021-04-14 16:03:48 +02:00
Markus Paff
0633dae4d0
new docs version (#964) 2021-04-14 13:40:05 +02:00
oryx1729
bba1d80aef Update Haystack version v0.8.0 2021-04-13 16:31:19 +02:00
Branden Chan
77d4c2ca1c
Benchmark milvus (#850)
* Add milvus benchmarking support

* Add latest docstring and tutorial changes

* Edit config

* Disable docker interactive mode

* Add milvus index type support

* Adjust FAISS and Milvus node branching

* Remove duplicate in config

* Revert method for speedup

* Add latest docstring and tutorial changes

* Add latest benchmark run

* Add latest docstring and tutorial changes

* Add json files

* Revert "Add latest docstring and tutorial changes"

This reverts commit e2efa5f08aa4fb55bbeeed42aa76817d63fc8923.

* Add latest docstring and tutorial changes

* Revert "Add latest docstring and tutorial changes"

This reverts commit b085a679b9d5f175e91c2c59565e73c5dec1374b.

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-13 14:54:15 +02:00
Markus Paff
b87daed62b
fixed link to dpr (#962) 2021-04-13 09:45:04 +02:00
Julian Risch
8333a13d6f
Adding tutorial on knowledge graphs to README 2021-04-12 15:26:02 +02:00
Markus Paff
dfb0282b74
Update milvus links and docstrings (#959)
* update milvus links and docstrings

* Add latest docstring and tutorial changes

* new milvus version

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-12 14:38:57 +02:00
oryx1729
406f7fa679
Disable Gunicorn preload option (#960) 2021-04-12 12:46:52 +02:00
Timo Moeller
837dea4e6d
Integrate sentence transformers into benchmarks (#843)
* Integrate sentence transformers into benchmarks

* Add doc store asserts

* switch data downloads from s3 client to https. add license info

* Fix mypy, revert config

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-09 17:24:16 +02:00
Julian Risch
d38c07e0ee
knowledge graph example (#934)
* Add knowledge graph module

* Fix type hint

* Add graph retriver module

* Change type annotations, change return format

* Add graph retriever that executes questions as sparql queries

* Linking only those entities that are in the knowledge graph

* Added logging and using relations extracted from Knowledge graph for linking

* Preventing entity linking from linking the same token to multiple entities

* Pruning triples that have no variables for select and count queries

* Support knowledge graphs with Pipelines

* Add text2sparql

* Entity linking and relation linking consider more special cases now based on evaluation on labelled data

* Separating example code from KGQA implementation

* Add eval on combined extarctive and kg questions

* Remove references to hp-test

* Add fields sparql_query and long_answer_list to metadata

* Removing modular Question2SPARQL approach

* Removing additional classes used for modular kgqa approach

* preparing lcquad data

* change graph db

* Translating namespaces in knowledge graph queries

* Creating graphdb index and loading triples from .ttl file

* Fetching graph config files, triples and model from S3

* Fix incompatibility issues with BaseGraphRetriever and BaseComponent

* Removing unused utility functions

* Adding doc strings and tutorial header

* Adding sparqlwrapper dependency

* Moving tutorial header

* Sorting tutorials by number within name of notebook

* Add latest docstring and tutorial changes

* Creating test cases for knowledge graph

* Changing knowledge graph example to harry potter

* Add latest docstring and tutorial changes

* Adapting the tutorial notebook to harry potter example

* Add GraphDB fixture for tests

* Add latest docstring and tutorial changes

* Added GraphDB docker launch to CI

* Use correct GraphDB fixture

* Check if GraphDB instance is already running

* Renaming question/query and incorporating other feedback from Timo and Tanay

* Removed type annotation

* Add latest docstring and tutorial changes

Co-authored-by: oryx1729 <oryx1729@protonmail.com>
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-08 14:05:33 +02:00
oryx1729
fc6368c191
Fix passing a list of values as param (#952) 2021-04-07 19:50:50 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines (#922) 2021-04-07 17:53:32 +02:00
Julian Risch
64ad953c6a
Adding indentation to markup files (#947) 2021-04-07 11:36:11 +02:00
lewtun
8894c4fae9
Reduce precision in pipeline eval print functions (#943)
A proposal to reduce the precision shown in the `EvalRetriever.print` and `EvalReader.print` to 4 significant figures. If the user wants the full precision, they can access the class attributes directly.

Before
```
Retriever
-----------------
has_answer recall: 0.8739495798319328 (208/238)
no_answer recall:  1.00 (120/120) (no_answer samples are always treated as correctly retrieved)
recall: 0.9162011173184358 (328 / 358)
```

After
```
Retriever
-----------------
has_answer recall: 0.8739 (208/238)
no_answer recall:  1.00 (120/120) (no_answer samples are always treated as correctly retrieved)
recall: 0.9162 (328 / 358)
```
2021-04-06 05:11:29 +02:00
lewtun
41a1c8329d
Fix division by zero error in EvalRetriever (#938)
If the first query in the evaluation returns a document with `no_answer=True` we got a division by zero error because neither `self.has_answer_correct` or `self.has_answer_count` get incremented. This fix moves the `self.has_answer_recall` calculation within the if-else block.
2021-04-03 18:13:36 +02:00
Timo Moeller
5d2b16f3cc
Update farm version (#936)
* Update farm version

* Add new DPR loading, fix dpr param name

* Add QA model confidence as answer probability, fix prams in test
2021-04-01 18:23:05 +02:00
Branden Chan
d77152c469
WIP: Add evaluation nodes for Pipelines (#904)
* Add main eval fns

* WIP: make pipeline_eval.py run

* Fix typo

* Add support for no_answers

* Add latest docstring and tutorial changes

* Working pipeline eval

* Add timing of nodes

* Add latest docstring and tutorial changes

* Refactor and clean

* Update tutorial script

* Set default params

* Update tutorials

* Fix indent

* Add latest docstring and tutorial changes

* Address mypy issues

* Add test

* Fix mypy error

* Clear outputs

* Add doc strings

* Incorporate reviewer feedback

* Add latest docstring and tutorial changes

* Revert query counting

* Fix typo

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-01 17:35:18 +02:00