3803 Commits

Author SHA1 Message Date
Malte Pietsch
a634eee40c
Update README.rst 2020-08-28 13:35:11 +02:00
Malte Pietsch
686e55b77c
Create CONTRIBUTING.md 2020-08-28 13:20:19 +02:00
Tanay Soni
f0fe16774f
Raise exception if filter supplied for query (#338) 2020-08-27 10:05:10 +02:00
Branden Chan
0ad22d5038
Merge pull request #331 from deepset-ai/robust_eval
More robust Reader eval by limiting max answers and creating no answer labels
2020-08-26 13:28:10 +02:00
brandenchan
f108939fc3 Change warning to info 2020-08-26 13:27:30 +02:00
brandenchan
b44b1ac6ec Set top_k_per_candidate 2020-08-26 12:03:56 +02:00
brandenchan
cca8676f90 More robust eval 2020-08-26 12:01:59 +02:00
kolk
f2b6cc761b
Refactor DPR from FB to Transformers codebase (#308)
* change_HFBertEncoder to transformers DPREncoder

* Removed BertTensorizer

* model download relative path

* Refactor model load

* Tutorial5 DPR updated

* fix print_eval_results typo

* copy transformers DPR modules in dpr_utils and test

* transformer v3.0.2 import errors fixed

* remove dependency of DPRConfig on attribute use_return_tuple

* Adjust transformers 302 locally to work with dpr

* projection layer removed from DPR encoders

* fixed mypy errors

* transformers DPR compatible code added

* transformers DPR compatibility added

* bug fix in tutorial 6 notebook

* Docstring update and variable naming issues fix

* tutorial modified to reflect DPR variable naming change

* title addition to passage use-cases handled

* modified handling untitled batch

* resolved mypy errors

* typos in docstrings and comments fixed

* cleaned DPR code and added new test cases

* warnings added for non-bert model [SEP] token removal

* changed warning to logger warning

* title mask creation refactored

* bug fix on cuda issues

* tutorial 6 instantiates modified DPR

* tutorial 5 modified

* tutorial 5 ipython notebook modified: DPR instantiation

* batch_size added to DPR instantiation

* tutorial 5 jupyter notebook typos fixed

* improved docstrings, fixed typos

* Update docstring

Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-25 20:16:00 +05:30
venuraja79
ea334658d6
DPR (Dense Retriever) for InMemoryDocumentStore #316 (#332) 2020-08-24 14:48:36 +02:00
Tanay Soni
3a42eb663e
Include InMemoryDocumetStore for DPR test 2020-08-24 14:44:12 +02:00
antoniolanza1996
85c743fe2a
Add refresh_type arg to ElasticsearchDocumentStore (#326)
* Added refresh type into ElasticsearchDocumentStore

* Update docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-21 09:36:04 +02:00
Tanay Soni
7d2a8f19fc
Improve speed for SQLDocumentStore (#330) 2020-08-21 09:24:49 +02:00
Branden Chan
a54d6a5bd7
Make Tutorials Work on Colab GPUs (#322)
* Add pip install torch+cu
2020-08-19 14:52:50 +02:00
bogdankostic
f388ca025c
Aggregate multiple no answers in MultiLabel (#324)
* Aggregate multiple no answers

* Add test for multiple no answers
2020-08-18 18:25:01 +02:00
bogdankostic
3a95fe2006
Align TransformersReader with FARMReader (#319)
* Align TransformersReader with FARMReader
2020-08-18 14:26:33 +02:00
bogdankostic
72b1013560
Restructure update embeddings (#304)
* Restructure update embeddings

* Adapt FAISSDocStore

* Adapt test and tutorial

Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-18 14:04:31 +02:00
brandenchan
8a3eca05c3 Change to retriever eval top_k to match notebook 2020-08-18 11:39:49 +02:00
bogdankostic
b30963d0cd
Add Tests for MultiLabel (#318)
* Add tests for MultiLabel

* Add test for no_answer and is_correct_answer=False + fix bug in MultiLabel aggregation

* Fix bug in MultiLabel aggregation
2020-08-17 20:14:31 +02:00
Tanay Soni
01ff66dfd6 Remove redundant test fixture 2020-08-17 14:19:38 +02:00
Tanay Soni
200bb4bafd
Refactor the DPR tutorial to use FAISS (#317) 2020-08-17 13:30:02 +02:00
Dany
f0222ecd27 Add Tika in CI 2020-08-17 11:35:33 +02:00
Dany
9d02b3ff9d Add requirement for Tika 2020-08-17 11:32:29 +02:00
Dany
ac2fe58b84 Add Tika Converter file (#314) 2020-08-17 11:26:56 +02:00
Dany
403318b1f5 Add Tika Converter (#314) 2020-08-17 11:21:09 +02:00
Tanay Soni
1637ce1184 Revert "Add Tika Converter (#314)"
This reverts commit 5ef59b1901da6d51bfa085683321a243228d4fc9.
2020-08-17 11:13:52 +02:00
Tanay Soni
5ef59b1901
Add Tika Converter (#314) 2020-08-14 14:13:59 +02:00
Timo Moeller
eb0fc6439c Remove inconsistent underscore in test_filename 2020-08-13 14:53:43 +02:00
Tanay Soni
089fecf99e
Fix indexing of metadata for FAISS/SQL Document Store (#310) 2020-08-13 12:25:32 +02:00
Tanay Soni
397dcf9d92
Ensure exact match when filtering by meta in Elasticsearch (#311) 2020-08-13 11:42:49 +02:00
Timo Moeller
4eeb7818af Set dev split to float, add docstring 2020-08-11 15:28:36 +02:00
Timo Moeller
85d384244a Remove dev splitting from qa training 2020-08-11 14:36:42 +02:00
Timo Moeller
8dd0ce963e
Datasilo use all cores for preprocessing (#303)
* Set correct default val
2020-08-11 09:45:58 +02:00
bogdankostic
5186d2d235
Batch prediction in evaluation (#137)
* Add Batch evaluation

* Separate evaluation methods

* Clean calculation of eval metrics

* Adapt eval to Label objects

* Fix format of no_answer

* Adapt to MultiLabel

* Add tests
2020-08-10 19:30:31 +02:00
antoniolanza1996
860f860b00
Added title during DPR passage embedding && ElasticsearchDocumentStore (#298)
* Added title during DPR passage embedding && ElasticsearchDocumentStore

* Added if-else to check if name is in the Elasticsearch meta docs

Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-10 15:36:18 +02:00
Karim Jana
c7078a36c0
Custom fields for indexing in ElasticsearchDocumentStore (#297) 2020-08-10 11:34:39 +02:00
Tanay Soni
2d27f19a71 Change default FAISS requirement to CPU 2020-08-07 16:23:15 +02:00
Tanay Soni
9d0df60aad
Add FAISS Document Store (#253) 2020-08-07 14:25:08 +02:00
Timo Moeller
72e6867278
Aggregate label objects for same questions (#292)
* Add aggregate labels obj, use in retriever eval function

* Change launch ES param

* Move aggregation from ES document store to base class

* Fix type annotations
2020-08-07 11:24:41 +02:00
Timo Moeller
d9e8b522a1
Add "no answer" aggregation to Transformersreader (#259)
* Add no answer aggregation

* Change to covariant type annotation

* Remove n_best_per_passage from transformersreader
2020-08-06 17:32:55 +02:00
Karim Jana
89dcfed619
Cast Search REST API logs to JSON (#290) 2020-08-06 10:36:56 +02:00
Tanay Soni
5937f9cf16
Deprecate Tags for Document Stores (#286) 2020-08-04 14:24:12 +02:00
Tanay Soni
6a103252ef
Add option to update existing documents when indexing (#285) 2020-08-04 08:54:09 +02:00
Tanay Soni
723921475f
Make document ids of str type (#284) 2020-08-03 16:20:17 +02:00
Tanay Soni
d90435efd6 Add wait for Elasticsearch update call 2020-07-31 12:06:27 +02:00
Malte Pietsch
29a15c0d59
Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243) 2020-07-31 11:34:06 +02:00
Tanay Soni
52370c7bd4
Update README.rst 2020-07-30 08:59:56 +02:00
Timo Moeller
5541a53f2d
Add export answers to CSV function (#266)
Add export answers to CSV function
2020-07-29 17:10:44 +02:00
Malte Pietsch
abec1be722
Add num_processes to reader.train() to configure multiprocessing (#271) 2020-07-29 16:28:23 +02:00
Malte Pietsch
52a805be86
Update README.rst 2020-07-24 21:11:28 +02:00
Malte Pietsch
02ae0ccad1 Resize sketch concepts 2020-07-24 21:09:31 +02:00