3803 Commits

Author SHA1 Message Date
Branden Chan
41b537affe
Add FAQ page (#1151)
* Add faq page

* Update faq.md

* Fix mypy CI

* Add question
2021-06-10 17:29:14 +02:00
venuraja79
49886f88f0
Integrate Weaviate as another DocumentStore (#1064)
* Annotation Tool: data is not persisted when using local version #853

* First version of weaviate

* First version of weaviate

* First version of weaviate

* Updated comments

* Updated comments

* ran query, get and write tests

* update embeddings, dynamic schema and filters implemented

* Initial set of tests and fixes

* Tests added for update_embeddings and delete documents

* introduced duplicate documents fix

* fixed mypy errors

* Added Weaviate to requirements

* Fix the weaviate docker env variables

* Fixing test dependencies for now

* Created weaviate test marker and fixed query

* Update docstring

* Add documentation

* Bump up weaviate version

* Bump up weaviate version in documentation

* Bump up weaviate version in documentation

* Updgrade weaviate version

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-10 09:43:53 +02:00
Lalit Pagaria
db17d73a82
Fixing issues caused due to mypy upgrade (#1165) 2021-06-09 16:24:39 +02:00
Branden Chan
5f0f85989a
Refresh API docs (#1152) 2021-06-09 16:13:58 +02:00
Shahrukh Khan
545c625a37
Add QueryClassifier incl. baseline models (#1099)
* restructure query classifier code and add s3 based pickles

* make model and vectorizer optional in query classifier

* update query classifier as per init style

* add query classifiers sklearn/hf

* update docstrings for query classifiers

* add unit test for query classifier

* add type patch for sklearn classifier

* fix mypy type issue

* revert to pure formatting

* add query classifiers

* resolve conflict

* add output names for query classifier

* revert output and update docstring queryclassifier

* Update docstring for SklearnQueryClassifier

* update transformer query classifier docstring

* fix typo

* change arg names in query classifier classes

* add set_config(). rename attributes

* fix set_config()

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-08 15:20:13 +02:00
Malte Pietsch
600636e77b
Update README.md 2021-06-08 09:23:56 +02:00
Branden Chan
59e3c55c47
Add More top_k handling to EvalDocuments (#1133)
* Improve top_k support

* Adjust warning

* Satisfy mypy

* Reinit eval counts if top_k has changed

* Incorporate reviewer feedback
2021-06-07 12:11:00 +02:00
Branden Chan
c513865566
Add L2 support for FAISS HNSW (#1138) 2021-06-04 11:05:18 +02:00
Julian Risch
580e28344d
Add docu of confidence scores and calibration method (#1131)
* Add docu of confidence scores and calibration method
2021-06-03 15:49:07 +02:00
Malte Pietsch
a1472b040c
Add badges (#1136) 2021-06-03 14:47:08 +02:00
Malte Pietsch
b41719b7c8
Add config to JoinDocuments node to allow yaml export in pipelines (#1134)
* add config to JoinNode to allow yaml export

* remove test print
2021-06-03 11:03:25 +02:00
Julian Risch
8e3d0d1287
Distinguish labels for calculating similarity scores (#1124)
* Distinguish labels for calculating similarity scores

* Explain label "0" and "1" of TextPairClassifier in Ranker
2021-06-02 17:33:36 +02:00
Branden Chan
b555bc525c
Remove duplicate run (#1132) 2021-06-02 13:58:55 +02:00
Branden Chan
09ba75073c
Improve Milvus HNSW Performance (#1127)
* Add simplified script

* Optimize HNSW index creation

* Adjust benchmark order

* Rename script
2021-06-02 13:17:35 +02:00
Branden Chan
9356f637d4
Update Milvus benchmarks (#1128)
* Update Milvus benchmarks

* Add sentence transformers

* Update sentence transformers index results

* Remove duplicate row
2021-06-02 13:09:45 +02:00
Branden Chan
aa6f768efa
Prevent merge of same questions on different documents during evaluation (#1119)
* Fix duplicate question in Reader.eval()

* Add duplicate question support in document store

* Support duplicate questions in retriever eval

* Update tutorial

* Rename key_tuple

* Change error message

* Add warning when more than 6 labels

* Allow for label grouping options

* Add support for aggregating by label meta

* Satisfy mypy

* Fix duplicate question in Reader.eval()

* Add duplicate question support in document store

* Support duplicate questions in retriever eval

* Update tutorial

* Rename key_tuple

* Change error message

* Add warning when more than 6 labels

* Allow for label grouping options

* Add support for aggregating by label meta

* Satisfy mypy

* Make label field flexible, add docstrings

* Satisfy mypy

* Fix failing tests

* Adjust docstring

* Fix tutorial

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-06-02 12:09:03 +02:00
Branden Chan
d8c47ed525
Preserve whitespace (#1121) 2021-06-02 12:08:22 +02:00
Malte Pietsch
022f8586f6
Remove Python 3.6 support (#1059)
* Remove Python 3.6 support

* change cache key for CI
2021-06-01 15:24:44 +02:00
Julian Risch
a7ba146246
Removed comma from last item in json list (#1114) 2021-06-01 12:32:21 +02:00
Julian Risch
40ceaf418a
Fixing grpcio-tools to version of colab's pre-installed grpcio (#1113) 2021-05-31 19:09:10 +02:00
Alvise Sembenico
6326cf5710
🐳 add PDF converter dependencies to Docker (#1107) 2021-05-31 19:01:02 +02:00
Branden Chan
6ca6ac0632
Add OpenDistro init (#1101) 2021-05-31 18:59:20 +02:00
Julian Risch
84c34295a1
Re-ranking component for document search without QA (#1025)
* Adding ranker similar to retriever and reader

* Sort documents according to query-document similarity scores

* Reranking and model training runs for small example

* Added EvalRanker node

* Calculate recall@k in EvalRetriever and EvalRanker nodes

* Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers

* Added mean reciprocal rank as metric for EvalDocuments

* Fix bug that appeared when ranking documents with same score

* Remove commented code for unimplmented eval() of Ranker node

* Add documentation of k parameter in EvalDocuments

* Add Ranker docu and renaming top_k param
2021-05-31 15:31:36 +02:00
Michaël Bitard
b5cae20ddb
Fix typo in streamlit UI (#1106) 2021-05-28 11:18:09 +02:00
Ikram Ali
94f1a2b5c9
Improve speed of FAISSDocumentStore.delete_documents() (#1095) 2021-05-26 07:56:09 +02:00
Ikram Ali
b76ed4c5a4
Add options for handling duplicate documents (skip, fail, overwrite) (#1088)
* [document_stores] Duplicate document implmentation added for memorystore.

* [document_stores]duplicate documents implementation done for faiss store.

* [document_store] Duplicate document feature added for elasticsearch document store fixed #1069

* [document_store] Duplicate documents feature added for milvus document store and bug fixed in faiss document store fixed #1069

* [document_store] Code refactored fixed #1069

* [document_store]Test cases refactored.

* [document_store] mypy issue fixed.

* [test_case] faiss and milvus test case refactored to support duplicate documents implementation. fixed #1069

* [document_store] duplicate_documents_options code refactored.

* [document_store] Code refactored.
2021-05-25 13:30:06 +02:00
Avishekh Shrestha
c4ee32d47d
Fix typo in preprocessing.md(#1087)
Correct variable name from 'd' to 'doc' in line 134.
2021-05-23 19:16:58 +02:00
Ikram Ali
4ab1bc3c3e
Improve the progress bar in update_embeddings() + Fix filters in update_embeddings() (#1063)
* [document_stores]Add the progressbar in update_embeddings() to track the overall documents progress closed #1037

* change 2nd level loop to docs. switch to tqdm.auto.

* [document_stores] Elasticsearch new method get_document_without_embedding_count() added.

* [test_case]  Elasticsearch documentstore get_document_without_embedding_count() test case added.

* [document_stores] Add new bool arg in get_document_count() method and fixed #1082

* [document_stores] typo fixed #1082

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-21 14:18:07 +02:00
Lalit Pagaria
f46b09c756
Using text hash as id to prevent document duplication (#1000)
* using text hash as id to prevent document duplication. Also providing a way customize it.

* Add latest docstring and tutorial changes

* Fixing duplicate value test when text is same

* Adding test for duplicate ids in document store

* Changing exception to generic Exception type

* add exception for inmemory. update docstring Document. remove id_hash_keys from object attribute

* Add latest docstring and tutorial changes

* Add latest docstring and tutorial changes

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-05-17 17:51:52 +02:00
Malte Pietsch
25d1122773
Upgrade milvus to 1.1.0 (#1066)
* upgrade milvus in CI to 1.1

* fix pymilvus

* loose pymilvus requirement again

* add date to cache keys

* fix date var in action
2021-05-17 17:27:34 +02:00
Moshe Berchansky
880edd139d
Add use_amp to DPR's train method to enable mixed precision training. (#1048) 2021-05-17 15:10:02 +02:00
Ikram Ali
a06e4450d1
Rename delete_all_documents() method to delete_documents() (#1047) 2021-05-10 13:37:08 +02:00
Branden Chan
5d31e633ce
Squad tools (#1029)
* Add first commit

* Add support for conversion to and from pandas df

* Add logging

* Add functionality

* Satisfy mypy

* Incorporate reviewer feedback
2021-05-06 19:02:15 +02:00
Branden Chan
373fef8d1e
Add white space normalization warning (#1022)
* Add white space normalization warning

* Implement safer document id fetching
2021-05-05 17:54:32 +02:00
Branden Chan
aadd8b049a
Add Tutorial 11 to Readme 2021-05-05 15:35:21 +02:00
oryx1729
9bec8859f2
Test ES connection only for the default user (#1028) 2021-05-04 15:03:19 +02:00
oryx1729
c41101ff74
Upgrade streamlit version (#1024) 2021-05-03 17:44:57 +02:00
Julian Risch
bf4563e5d2
Filtering duplicate answers (#1021)
* Allow filtering of duplicate answers as implemented in FARM

* Changed default behavior to filtering exact duplicates

* Change expected test result due to filtering of duplicate answers by default

* Rounding expected test results for comparison with predictions
2021-05-03 17:18:10 +02:00
Bhadresh Savani
ca63f9fee2
Fix debug message for file-upload in UI (#1018) 2021-05-03 09:18:55 +02:00
brandenchan
5b0b3e4616 Merge branch 'master' of https://github.com/deepset-ai/haystack 2021-04-30 16:41:05 +02:00
brandenchan
4cc853d1c3 Update link 2021-04-30 15:06:45 +02:00
Branden Chan
869b493b61
Regen api docs (#1015) 2021-04-30 12:35:13 +02:00
oryx1729
99990e7249
Add export of Pipeline YAML config (#1003) 2021-04-30 12:23:29 +02:00
Mario Jäckle
a00703256f
docs(document_store): add usage information for aws elastic search (#1008)
Co-authored-by: Mario Jäckle <m.jaeckle@careerpartner.eu>
2021-04-30 11:38:25 +02:00
Bhadresh Savani
37a72d2f45
Add File Upload Functionality in UI (#995) 2021-04-30 10:46:30 +02:00
Branden Chan
056be3354b
Add pipelines tutorial (#1013) 2021-04-29 18:19:20 +02:00
Branden Chan
9827b3652e
Pipelines tutorial (#991)
* Start Pipelines tutorial

* Make Tutorial 11 run locally

* Add colab compatibility

* Fix pip install

* Add ES install from source

* Add ES install from source

* Add pygraphviz installation

* Incorporate reviewer feedback

* Ensure print_answers() works for Generator output

* Fix typo
2021-04-29 17:31:28 +02:00
Julian Risch
65f1da00cc
knowledge graph documentation (#979)
* Create knowledge_graph.md

* add doc strings to Text2SparqlRetriever

* Add doc strings to GraphDBKnowledgeGraph

* Make method calls unambiguous so its clear which class is meant
2021-04-27 16:44:40 +02:00
oryx1729
8a57f6b16a
Update tests for FAISSDocumentStore (#999) 2021-04-27 09:55:31 +02:00
Markus Paff
cf8a622e35
Streamlit UI Evaluation mode (#920)
* first running version of eval mode

* restructuring, new naming of elements and testing

* add new files to Docker, how to start with Haystack reference, remove not needed dependencies

* Add latest docstring and tutorial changes

* merged changes

* fixing bugs after breaking changes from last release

* newser version of states in streamlit, more docs for eval mode, eval file as env virable

* eval file as env variable

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-04-22 17:30:17 +02:00