3174 Commits

Author SHA1 Message Date
Timo Moeller
4eeb7818af Set dev split to float, add docstring 2020-08-11 15:28:36 +02:00
Timo Moeller
85d384244a Remove dev splitting from qa training 2020-08-11 14:36:42 +02:00
Timo Moeller
8dd0ce963e
Datasilo use all cores for preprocessing (#303)
* Set correct default val
2020-08-11 09:45:58 +02:00
bogdankostic
5186d2d235
Batch prediction in evaluation (#137)
* Add Batch evaluation

* Separate evaluation methods

* Clean calculation of eval metrics

* Adapt eval to Label objects

* Fix format of no_answer

* Adapt to MultiLabel

* Add tests
2020-08-10 19:30:31 +02:00
antoniolanza1996
860f860b00
Added title during DPR passage embedding && ElasticsearchDocumentStore (#298)
* Added title during DPR passage embedding && ElasticsearchDocumentStore

* Added if-else to check if name is in the Elasticsearch meta docs

Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-10 15:36:18 +02:00
Karim Jana
c7078a36c0
Custom fields for indexing in ElasticsearchDocumentStore (#297) 2020-08-10 11:34:39 +02:00
Tanay Soni
2d27f19a71 Change default FAISS requirement to CPU 2020-08-07 16:23:15 +02:00
Tanay Soni
9d0df60aad
Add FAISS Document Store (#253) 2020-08-07 14:25:08 +02:00
Timo Moeller
72e6867278
Aggregate label objects for same questions (#292)
* Add aggregate labels obj, use in retriever eval function

* Change launch ES param

* Move aggregation from ES document store to base class

* Fix type annotations
2020-08-07 11:24:41 +02:00
Timo Moeller
d9e8b522a1
Add "no answer" aggregation to Transformersreader (#259)
* Add no answer aggregation

* Change to covariant type annotation

* Remove n_best_per_passage from transformersreader
2020-08-06 17:32:55 +02:00
Karim Jana
89dcfed619
Cast Search REST API logs to JSON (#290) 2020-08-06 10:36:56 +02:00
Tanay Soni
5937f9cf16
Deprecate Tags for Document Stores (#286) 2020-08-04 14:24:12 +02:00
Tanay Soni
6a103252ef
Add option to update existing documents when indexing (#285) 2020-08-04 08:54:09 +02:00
Tanay Soni
723921475f
Make document ids of str type (#284) 2020-08-03 16:20:17 +02:00
Tanay Soni
d90435efd6 Add wait for Elasticsearch update call 2020-07-31 12:06:27 +02:00
Malte Pietsch
29a15c0d59
Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243) 2020-07-31 11:34:06 +02:00
Tanay Soni
52370c7bd4
Update README.rst 2020-07-30 08:59:56 +02:00
Timo Moeller
5541a53f2d
Add export answers to CSV function (#266)
Add export answers to CSV function
2020-07-29 17:10:44 +02:00
Malte Pietsch
abec1be722
Add num_processes to reader.train() to configure multiprocessing (#271) 2020-07-29 16:28:23 +02:00
Malte Pietsch
52a805be86
Update README.rst 2020-07-24 21:11:28 +02:00
Malte Pietsch
02ae0ccad1 Resize sketch concepts 2020-07-24 21:09:31 +02:00
Malte Pietsch
e1962a4e4b update sketch concepts 2020-07-24 21:05:24 +02:00
Malte Pietsch
6283348096 add concept sketch 2020-07-24 21:01:37 +02:00
antoniolanza1996
b55de6f70a
Added support for unanswerable questions in TransformersReader (#258)
* Added support for unanswerable questions in TransformersReader

Co-authored-by: Antonio Lanza <anotniolanza1996@gmail.com>
2020-07-23 10:45:58 +02:00
Timo Moeller
f0d901a374 Simplify farmreader predict 2020-07-23 10:27:43 +02:00
Malte Pietsch
ce50718103
Update README.rst 2020-07-20 14:58:12 +02:00
antoniolanza1996
cdaa6f0c66
Fix type of query_emb in DPR.retrieve() (#247) 2020-07-18 22:13:52 +02:00
Malte Pietsch
5b1be233d0 Update Tutorial 4 2020-07-17 19:31:00 +02:00
Malte Pietsch
355be293b6
Fix return type of EmbeddingRetriever to numpy array (#245) 2020-07-17 19:03:31 +02:00
Malte Pietsch
4da480aa15 Fix dockerfiles 2020-07-16 15:58:49 +02:00
Tanay Soni
5210c8c2ab
Add method to update meta fields for documents in Elasticsearch (#242) 2020-07-16 15:34:55 +02:00
Malte Pietsch
a6ec430931 Fix readme rst syntax 0.3.0 2020-07-16 13:27:44 +02:00
Malte Pietsch
d2d048c9fa Upgrade version number to 0.3.0 2020-07-16 13:21:00 +02:00
Malte Pietsch
1289cc6fbb
Fix format of /export-doc-qa-feedback to comply with SQuAD (#241) 2020-07-16 13:17:45 +02:00
Tanay Soni
292b599cdd
Remove meta field when indexing in Elasticsearch (#240) 2020-07-16 13:11:04 +02:00
Malte Pietsch
cec6a0e821
Update README.rst 2020-07-16 11:05:25 +02:00
Malte Pietsch
6bed2f509f
Refactor DPR for latest transformers version & change init arg gpu -> use_gpu for DPR and EmbeddingRetriever (#239)
* fix tokenizer warning in latest transformers

* change dpr arg from gpu to use_gpu

* change gpu arg for EmbeddingRetriever
2020-07-16 10:45:01 +02:00
Malte Pietsch
e5b6546112 Change default reader for REST API 2020-07-16 10:02:06 +02:00
Malte Pietsch
337680baf5
Update README.rst 2020-07-16 09:38:39 +02:00
Anirban Saha
7e24620159
Update readme (#229) 2020-07-15 19:14:25 +02:00
Malte Pietsch
c9d3146fae
Fix multi-gpu training via DataParallel (#234) 2020-07-15 18:34:55 +02:00
Tanay Soni
5c1a5fe61d
Add dummy retriever for benchmarking / reader-only settings (#235) 2020-07-15 17:22:17 +02:00
Malte Pietsch
eb658d308e Upgrade version to 0.2.2 2020-07-15 17:07:29 +02:00
Branden Chan
36867dabac change from top_n_recall to accuracy 2020-07-15 17:05:08 +02:00
Branden Chan
ec795314dc
Merge pull request #233 from deepset-ai/update_eval_data
Fix Evaluation Dataset
2020-07-15 16:45:29 +02:00
Branden Chan
64721d3196 One more update 2020-07-15 16:24:10 +02:00
Branden Chan
c55477e0ce update eval dataset 2020-07-15 16:14:52 +02:00
Tanay Soni
912e98cd40
Fix id for documents returned by the TfidfRetriever (#232) 2020-07-15 14:55:07 +02:00
Tanay Soni
4e10a1520d
Remove mutation of documents in write_documents() (#231) 2020-07-15 13:10:52 +02:00
Tanay Soni
e1d64c2c68
Fix print_answers to not delete keys from passed results object (#230) 2020-07-15 12:49:14 +02:00