Tanay Soni
3399fc784d
Refactor file converter interface ( #393 )
2020-09-18 10:42:13 +02:00
Malte Pietsch
4e46d9d176
remove dpr_utils.py
2020-09-17 17:17:19 +02:00
Tanay Soni
06243dbda4
Move retriever probability calculations to document_store ( #389 )
2020-09-17 16:25:46 +02:00
Tanay Soni
03fa4a8740
Exclude embedding fields from the REST API ( #390 )
2020-09-17 14:37:01 +02:00
Malte Pietsch
3782646948
Add logo to readme ( #384 )
...
* add logo image
* add logo to readme
* change img path to master
* Update README.rst
2020-09-16 18:36:22 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) ( #379 )
...
* rename database to documentstore
* move document, label, multilabel to haystack/schema.py
* rename documentstore -> document_store
* split indexing modules -> file_converter + preprocessor
* fix order of imports
* Update tutorial notebooks
* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
Malte Pietsch
bde33ddaaa
Bump FARM version to 0.4.8 and PyTorch >=1.5.1, <= 1.6.0 ( #376 )
...
* bump farm version to 0.4.8
* move back to original transformers pipeline
* remove dpr_utils and use transformers implementation
* update tutorial notebooks
2020-09-16 17:24:40 +02:00
Lalit P
de5ad42e46
Adjust tests for MacOS ( #374 )
2020-09-15 15:04:46 +02:00
Tanay Soni
c0c2865e58
Add FAISS query scores ( #368 )
2020-09-11 13:59:38 +02:00
Tanay Soni
9d93ffbe54
Add Gunicorn timeout ( #364 )
2020-09-10 09:20:39 +02:00
maxupp
06e8be30ea
Add index arg to Finder.get_answers() and _via_similar_questions() ( #362 )
...
Co-authored-by: Max Uppenkamp <max.uppenkamp@inform-software.com>
2020-09-09 12:39:13 +02:00
Malte Pietsch
b1cdc68d6c
Update README.rst
2020-09-09 11:47:17 +02:00
Malte Pietsch
d821e8d260
Bump FARM version to 0.4.7 ( #340 )
2020-09-04 17:29:14 +02:00
Tanay Soni
26e4e7ad7a
Use port 8000 in docs ( #357 )
2020-09-04 09:54:24 +02:00
Malte Pietsch
a634eee40c
Update README.rst
2020-08-28 13:35:11 +02:00
Malte Pietsch
686e55b77c
Create CONTRIBUTING.md
2020-08-28 13:20:19 +02:00
Tanay Soni
f0fe16774f
Raise exception if filter supplied for query ( #338 )
2020-08-27 10:05:10 +02:00
Branden Chan
0ad22d5038
Merge pull request #331 from deepset-ai/robust_eval
...
More robust Reader eval by limiting max answers and creating no answer labels
2020-08-26 13:28:10 +02:00
brandenchan
f108939fc3
Change warning to info
2020-08-26 13:27:30 +02:00
brandenchan
b44b1ac6ec
Set top_k_per_candidate
2020-08-26 12:03:56 +02:00
brandenchan
cca8676f90
More robust eval
2020-08-26 12:01:59 +02:00
kolk
f2b6cc761b
Refactor DPR from FB to Transformers codebase ( #308 )
...
* change_HFBertEncoder to transformers DPREncoder
* Removed BertTensorizer
* model download relative path
* Refactor model load
* Tutorial5 DPR updated
* fix print_eval_results typo
* copy transformers DPR modules in dpr_utils and test
* transformer v3.0.2 import errors fixed
* remove dependency of DPRConfig on attribute use_return_tuple
* Adjust transformers 302 locally to work with dpr
* projection layer removed from DPR encoders
* fixed mypy errors
* transformers DPR compatible code added
* transformers DPR compatibility added
* bug fix in tutorial 6 notebook
* Docstring update and variable naming issues fix
* tutorial modified to reflect DPR variable naming change
* title addition to passage use-cases handled
* modified handling untitled batch
* resolved mypy errors
* typos in docstrings and comments fixed
* cleaned DPR code and added new test cases
* warnings added for non-bert model [SEP] token removal
* changed warning to logger warning
* title mask creation refactored
* bug fix on cuda issues
* tutorial 6 instantiates modified DPR
* tutorial 5 modified
* tutorial 5 ipython notebook modified: DPR instantiation
* batch_size added to DPR instantiation
* tutorial 5 jupyter notebook typos fixed
* improved docstrings, fixed typos
* Update docstring
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-25 20:16:00 +05:30
venuraja79
ea334658d6
DPR (Dense Retriever) for InMemoryDocumentStore #316 ( #332 )
2020-08-24 14:48:36 +02:00
Tanay Soni
3a42eb663e
Include InMemoryDocumetStore for DPR test
2020-08-24 14:44:12 +02:00
antoniolanza1996
85c743fe2a
Add refresh_type arg to ElasticsearchDocumentStore ( #326 )
...
* Added refresh type into ElasticsearchDocumentStore
* Update docstring
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-21 09:36:04 +02:00
Tanay Soni
7d2a8f19fc
Improve speed for SQLDocumentStore ( #330 )
2020-08-21 09:24:49 +02:00
Branden Chan
a54d6a5bd7
Make Tutorials Work on Colab GPUs ( #322 )
...
* Add pip install torch+cu
2020-08-19 14:52:50 +02:00
bogdankostic
f388ca025c
Aggregate multiple no answers in MultiLabel ( #324 )
...
* Aggregate multiple no answers
* Add test for multiple no answers
2020-08-18 18:25:01 +02:00
bogdankostic
3a95fe2006
Align TransformersReader with FARMReader ( #319 )
...
* Align TransformersReader with FARMReader
2020-08-18 14:26:33 +02:00
bogdankostic
72b1013560
Restructure update embeddings ( #304 )
...
* Restructure update embeddings
* Adapt FAISSDocStore
* Adapt test and tutorial
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-18 14:04:31 +02:00
brandenchan
8a3eca05c3
Change to retriever eval top_k to match notebook
2020-08-18 11:39:49 +02:00
bogdankostic
b30963d0cd
Add Tests for MultiLabel ( #318 )
...
* Add tests for MultiLabel
* Add test for no_answer and is_correct_answer=False + fix bug in MultiLabel aggregation
* Fix bug in MultiLabel aggregation
2020-08-17 20:14:31 +02:00
Tanay Soni
01ff66dfd6
Remove redundant test fixture
2020-08-17 14:19:38 +02:00
Tanay Soni
200bb4bafd
Refactor the DPR tutorial to use FAISS ( #317 )
2020-08-17 13:30:02 +02:00
Dany
f0222ecd27
Add Tika in CI
2020-08-17 11:35:33 +02:00
Dany
9d02b3ff9d
Add requirement for Tika
2020-08-17 11:32:29 +02:00
Dany
ac2fe58b84
Add Tika Converter file ( #314 )
2020-08-17 11:26:56 +02:00
Dany
403318b1f5
Add Tika Converter ( #314 )
2020-08-17 11:21:09 +02:00
Tanay Soni
1637ce1184
Revert "Add Tika Converter ( #314 )"
...
This reverts commit 5ef59b1901da6d51bfa085683321a243228d4fc9.
2020-08-17 11:13:52 +02:00
Tanay Soni
5ef59b1901
Add Tika Converter ( #314 )
2020-08-14 14:13:59 +02:00
Timo Moeller
eb0fc6439c
Remove inconsistent underscore in test_filename
2020-08-13 14:53:43 +02:00
Tanay Soni
089fecf99e
Fix indexing of metadata for FAISS/SQL Document Store ( #310 )
2020-08-13 12:25:32 +02:00
Tanay Soni
397dcf9d92
Ensure exact match when filtering by meta in Elasticsearch ( #311 )
2020-08-13 11:42:49 +02:00
Timo Moeller
4eeb7818af
Set dev split to float, add docstring
2020-08-11 15:28:36 +02:00
Timo Moeller
85d384244a
Remove dev splitting from qa training
2020-08-11 14:36:42 +02:00
Timo Moeller
8dd0ce963e
Datasilo use all cores for preprocessing ( #303 )
...
* Set correct default val
2020-08-11 09:45:58 +02:00
bogdankostic
5186d2d235
Batch prediction in evaluation ( #137 )
...
* Add Batch evaluation
* Separate evaluation methods
* Clean calculation of eval metrics
* Adapt eval to Label objects
* Fix format of no_answer
* Adapt to MultiLabel
* Add tests
2020-08-10 19:30:31 +02:00
antoniolanza1996
860f860b00
Added title during DPR passage embedding && ElasticsearchDocumentStore ( #298 )
...
* Added title during DPR passage embedding && ElasticsearchDocumentStore
* Added if-else to check if name is in the Elasticsearch meta docs
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-10 15:36:18 +02:00
Karim Jana
c7078a36c0
Custom fields for indexing in ElasticsearchDocumentStore ( #297 )
2020-08-10 11:34:39 +02:00
Tanay Soni
2d27f19a71
Change default FAISS requirement to CPU
2020-08-07 16:23:15 +02:00