Malte Pietsch
a634eee40c
Update README.rst
2020-08-28 13:35:11 +02:00
Malte Pietsch
686e55b77c
Create CONTRIBUTING.md
2020-08-28 13:20:19 +02:00
Tanay Soni
f0fe16774f
Raise exception if filter supplied for query ( #338 )
2020-08-27 10:05:10 +02:00
Branden Chan
0ad22d5038
Merge pull request #331 from deepset-ai/robust_eval
...
More robust Reader eval by limiting max answers and creating no answer labels
2020-08-26 13:28:10 +02:00
brandenchan
f108939fc3
Change warning to info
2020-08-26 13:27:30 +02:00
brandenchan
b44b1ac6ec
Set top_k_per_candidate
2020-08-26 12:03:56 +02:00
brandenchan
cca8676f90
More robust eval
2020-08-26 12:01:59 +02:00
kolk
f2b6cc761b
Refactor DPR from FB to Transformers codebase ( #308 )
...
* change_HFBertEncoder to transformers DPREncoder
* Removed BertTensorizer
* model download relative path
* Refactor model load
* Tutorial5 DPR updated
* fix print_eval_results typo
* copy transformers DPR modules in dpr_utils and test
* transformer v3.0.2 import errors fixed
* remove dependency of DPRConfig on attribute use_return_tuple
* Adjust transformers 302 locally to work with dpr
* projection layer removed from DPR encoders
* fixed mypy errors
* transformers DPR compatible code added
* transformers DPR compatibility added
* bug fix in tutorial 6 notebook
* Docstring update and variable naming issues fix
* tutorial modified to reflect DPR variable naming change
* title addition to passage use-cases handled
* modified handling untitled batch
* resolved mypy errors
* typos in docstrings and comments fixed
* cleaned DPR code and added new test cases
* warnings added for non-bert model [SEP] token removal
* changed warning to logger warning
* title mask creation refactored
* bug fix on cuda issues
* tutorial 6 instantiates modified DPR
* tutorial 5 modified
* tutorial 5 ipython notebook modified: DPR instantiation
* batch_size added to DPR instantiation
* tutorial 5 jupyter notebook typos fixed
* improved docstrings, fixed typos
* Update docstring
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-25 20:16:00 +05:30
venuraja79
ea334658d6
DPR (Dense Retriever) for InMemoryDocumentStore #316 ( #332 )
2020-08-24 14:48:36 +02:00
Tanay Soni
3a42eb663e
Include InMemoryDocumetStore for DPR test
2020-08-24 14:44:12 +02:00
antoniolanza1996
85c743fe2a
Add refresh_type arg to ElasticsearchDocumentStore ( #326 )
...
* Added refresh type into ElasticsearchDocumentStore
* Update docstring
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-21 09:36:04 +02:00
Tanay Soni
7d2a8f19fc
Improve speed for SQLDocumentStore ( #330 )
2020-08-21 09:24:49 +02:00
Branden Chan
a54d6a5bd7
Make Tutorials Work on Colab GPUs ( #322 )
...
* Add pip install torch+cu
2020-08-19 14:52:50 +02:00
bogdankostic
f388ca025c
Aggregate multiple no answers in MultiLabel ( #324 )
...
* Aggregate multiple no answers
* Add test for multiple no answers
2020-08-18 18:25:01 +02:00
bogdankostic
3a95fe2006
Align TransformersReader with FARMReader ( #319 )
...
* Align TransformersReader with FARMReader
2020-08-18 14:26:33 +02:00
bogdankostic
72b1013560
Restructure update embeddings ( #304 )
...
* Restructure update embeddings
* Adapt FAISSDocStore
* Adapt test and tutorial
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-18 14:04:31 +02:00
brandenchan
8a3eca05c3
Change to retriever eval top_k to match notebook
2020-08-18 11:39:49 +02:00
bogdankostic
b30963d0cd
Add Tests for MultiLabel ( #318 )
...
* Add tests for MultiLabel
* Add test for no_answer and is_correct_answer=False + fix bug in MultiLabel aggregation
* Fix bug in MultiLabel aggregation
2020-08-17 20:14:31 +02:00
Tanay Soni
01ff66dfd6
Remove redundant test fixture
2020-08-17 14:19:38 +02:00
Tanay Soni
200bb4bafd
Refactor the DPR tutorial to use FAISS ( #317 )
2020-08-17 13:30:02 +02:00
Dany
f0222ecd27
Add Tika in CI
2020-08-17 11:35:33 +02:00
Dany
9d02b3ff9d
Add requirement for Tika
2020-08-17 11:32:29 +02:00
Dany
ac2fe58b84
Add Tika Converter file ( #314 )
2020-08-17 11:26:56 +02:00
Dany
403318b1f5
Add Tika Converter ( #314 )
2020-08-17 11:21:09 +02:00
Tanay Soni
1637ce1184
Revert "Add Tika Converter ( #314 )"
...
This reverts commit 5ef59b1901da6d51bfa085683321a243228d4fc9.
2020-08-17 11:13:52 +02:00
Tanay Soni
5ef59b1901
Add Tika Converter ( #314 )
2020-08-14 14:13:59 +02:00
Timo Moeller
eb0fc6439c
Remove inconsistent underscore in test_filename
2020-08-13 14:53:43 +02:00
Tanay Soni
089fecf99e
Fix indexing of metadata for FAISS/SQL Document Store ( #310 )
2020-08-13 12:25:32 +02:00
Tanay Soni
397dcf9d92
Ensure exact match when filtering by meta in Elasticsearch ( #311 )
2020-08-13 11:42:49 +02:00
Timo Moeller
4eeb7818af
Set dev split to float, add docstring
2020-08-11 15:28:36 +02:00
Timo Moeller
85d384244a
Remove dev splitting from qa training
2020-08-11 14:36:42 +02:00
Timo Moeller
8dd0ce963e
Datasilo use all cores for preprocessing ( #303 )
...
* Set correct default val
2020-08-11 09:45:58 +02:00
bogdankostic
5186d2d235
Batch prediction in evaluation ( #137 )
...
* Add Batch evaluation
* Separate evaluation methods
* Clean calculation of eval metrics
* Adapt eval to Label objects
* Fix format of no_answer
* Adapt to MultiLabel
* Add tests
2020-08-10 19:30:31 +02:00
antoniolanza1996
860f860b00
Added title during DPR passage embedding && ElasticsearchDocumentStore ( #298 )
...
* Added title during DPR passage embedding && ElasticsearchDocumentStore
* Added if-else to check if name is in the Elasticsearch meta docs
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-10 15:36:18 +02:00
Karim Jana
c7078a36c0
Custom fields for indexing in ElasticsearchDocumentStore ( #297 )
2020-08-10 11:34:39 +02:00
Tanay Soni
2d27f19a71
Change default FAISS requirement to CPU
2020-08-07 16:23:15 +02:00
Tanay Soni
9d0df60aad
Add FAISS Document Store ( #253 )
2020-08-07 14:25:08 +02:00
Timo Moeller
72e6867278
Aggregate label objects for same questions ( #292 )
...
* Add aggregate labels obj, use in retriever eval function
* Change launch ES param
* Move aggregation from ES document store to base class
* Fix type annotations
2020-08-07 11:24:41 +02:00
Timo Moeller
d9e8b522a1
Add "no answer" aggregation to Transformersreader ( #259 )
...
* Add no answer aggregation
* Change to covariant type annotation
* Remove n_best_per_passage from transformersreader
2020-08-06 17:32:55 +02:00
Karim Jana
89dcfed619
Cast Search REST API logs to JSON ( #290 )
2020-08-06 10:36:56 +02:00
Tanay Soni
5937f9cf16
Deprecate Tags for Document Stores ( #286 )
2020-08-04 14:24:12 +02:00
Tanay Soni
6a103252ef
Add option to update existing documents when indexing ( #285 )
2020-08-04 08:54:09 +02:00
Tanay Soni
723921475f
Make document ids of str type ( #284 )
2020-08-03 16:20:17 +02:00
Tanay Soni
d90435efd6
Add wait for Elasticsearch update call
2020-07-31 12:06:27 +02:00
Malte Pietsch
29a15c0d59
Add eval for Dense Passage Retriever & Refactor handling of labels/feedback ( #243 )
2020-07-31 11:34:06 +02:00
Tanay Soni
52370c7bd4
Update README.rst
2020-07-30 08:59:56 +02:00
Timo Moeller
5541a53f2d
Add export answers to CSV function ( #266 )
...
Add export answers to CSV function
2020-07-29 17:10:44 +02:00
Malte Pietsch
abec1be722
Add num_processes to reader.train() to configure multiprocessing ( #271 )
2020-07-29 16:28:23 +02:00
Malte Pietsch
52a805be86
Update README.rst
2020-07-24 21:11:28 +02:00
Malte Pietsch
02ae0ccad1
Resize sketch concepts
2020-07-24 21:09:31 +02:00