3174 Commits

Author SHA1 Message Date
Malte Pietsch
db6864d159
Fix type casting for vectors in FAISS (#399)
* Fix type casting for vectors in FAISS

Co-authored-by: philipp-bode <philipp.bode@student.hpi.de>

* add type casts for elastic. refactor embedding retriever tests

* fix case: empty embedding field

* fix faiss tolerance

* add assert in test_faiss_retrieving

Co-authored-by: philipp-bode <philipp.bode@student.hpi.de>
2020-09-18 17:08:13 +02:00
Branden Chan
4ea4cfd282
Merge pull request #400 from deepset-ai/fix_imgs
Fix images in readme
2020-09-18 15:01:20 +02:00
brandenchan
f4a1682570 Fix images 2020-09-18 14:58:03 +02:00
Malte Pietsch
d69133966d Fix faiss test tolerance 2020-09-18 13:57:29 +02:00
Branden Chan
7fdb85d63a
Create documentation website (#272)
* Skeleton of doc website

* Flesh out documentation pages

* Split concepts into their own rst files

* add tutorial rsts

* Consistent level 1 markdown headers in tutorials

* Change theme to readthedocs

* Turn bullet points into prose

* Populate sections

* Add more text

* Add more sphinx files

* Add more retriever documentation

* combined all documenations in one structure

* rename of src to _src as it was ignored by git

* Incorporate MP2's changes

* add benchmark bar charts

* Adapt docstrings in Readers

* Improvements to intro, creation of glossary

* Adapt docstrings in Retrievers

* Adapt docstrings in Finder

* Adapt Docstrings of Finder

* Updates to text

* Edit text

* update doc strings

* proof read tutorials

* Edit text

* Edit text

* Add stacked chart

* populate graph with data

* Switch Documentation to markdown (#386)

* add way to generate markdown files to sphinx

* changed from rst to markdown and extended sphinx for it

* fix spelling

* Clean titles

* delete file

* change spelling

* add sections to document store usage

* add basic rest api docs

* fix readme in setup.py

* Update Tutorials

* Change section names

* add windows note to pip install

* update intro

* new renderer for markdown files

* Fix typos

* delete dpr_utils.py

* fix windows note in get started

* Fix docstrings

* deleted rest api docs in api

* fixed typo

* Fix docstring

* revert readme to rst

* Fix readme

* Update setup.py

Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-09-18 12:57:32 +02:00
Malte Pietsch
4c503158a7
Fix duplicate vector ids in FAISS (#395)
* fix duplicate vector ids in faiss

* Add test

Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>

* revert score change

* switch to faiss_index.ntotal for ids. add tests

Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>
2020-09-18 12:52:22 +02:00
Tanay Soni
0859da8f74
Fix document filtering in SQLDocumentStore (#396) 2020-09-18 12:22:52 +02:00
Tanay Soni
3399fc784d
Refactor file converter interface (#393) 2020-09-18 10:42:13 +02:00
Malte Pietsch
4e46d9d176 remove dpr_utils.py 2020-09-17 17:17:19 +02:00
Tanay Soni
06243dbda4
Move retriever probability calculations to document_store (#389) 2020-09-17 16:25:46 +02:00
Tanay Soni
03fa4a8740
Exclude embedding fields from the REST API (#390) 2020-09-17 14:37:01 +02:00
Malte Pietsch
3782646948
Add logo to readme (#384)
* add logo image

* add logo to readme

* change img path to master

* Update README.rst
2020-09-16 18:36:22 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) (#379)
* rename database to documentstore

* move document, label, multilabel to haystack/schema.py

* rename documentstore -> document_store

* split indexing modules -> file_converter + preprocessor

* fix order of imports

* Update tutorial notebooks

* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
Malte Pietsch
bde33ddaaa
Bump FARM version to 0.4.8 and PyTorch >=1.5.1, <= 1.6.0 (#376)
* bump farm version to 0.4.8

* move back to original transformers pipeline

* remove dpr_utils and use transformers implementation

* update tutorial notebooks
2020-09-16 17:24:40 +02:00
Lalit P
de5ad42e46
Adjust tests for MacOS (#374) 2020-09-15 15:04:46 +02:00
Tanay Soni
c0c2865e58
Add FAISS query scores (#368) 2020-09-11 13:59:38 +02:00
Tanay Soni
9d93ffbe54
Add Gunicorn timeout (#364) 2020-09-10 09:20:39 +02:00
maxupp
06e8be30ea
Add index arg to Finder.get_answers() and _via_similar_questions() (#362)
Co-authored-by: Max Uppenkamp <max.uppenkamp@inform-software.com>
2020-09-09 12:39:13 +02:00
Malte Pietsch
b1cdc68d6c
Update README.rst 2020-09-09 11:47:17 +02:00
Malte Pietsch
d821e8d260
Bump FARM version to 0.4.7 (#340) 2020-09-04 17:29:14 +02:00
Tanay Soni
26e4e7ad7a
Use port 8000 in docs (#357) 2020-09-04 09:54:24 +02:00
Malte Pietsch
a634eee40c
Update README.rst 2020-08-28 13:35:11 +02:00
Malte Pietsch
686e55b77c
Create CONTRIBUTING.md 2020-08-28 13:20:19 +02:00
Tanay Soni
f0fe16774f
Raise exception if filter supplied for query (#338) 2020-08-27 10:05:10 +02:00
Branden Chan
0ad22d5038
Merge pull request #331 from deepset-ai/robust_eval
More robust Reader eval by limiting max answers and creating no answer labels
2020-08-26 13:28:10 +02:00
brandenchan
f108939fc3 Change warning to info 2020-08-26 13:27:30 +02:00
brandenchan
b44b1ac6ec Set top_k_per_candidate 2020-08-26 12:03:56 +02:00
brandenchan
cca8676f90 More robust eval 2020-08-26 12:01:59 +02:00
kolk
f2b6cc761b
Refactor DPR from FB to Transformers codebase (#308)
* change_HFBertEncoder to transformers DPREncoder

* Removed BertTensorizer

* model download relative path

* Refactor model load

* Tutorial5 DPR updated

* fix print_eval_results typo

* copy transformers DPR modules in dpr_utils and test

* transformer v3.0.2 import errors fixed

* remove dependency of DPRConfig on attribute use_return_tuple

* Adjust transformers 302 locally to work with dpr

* projection layer removed from DPR encoders

* fixed mypy errors

* transformers DPR compatible code added

* transformers DPR compatibility added

* bug fix in tutorial 6 notebook

* Docstring update and variable naming issues fix

* tutorial modified to reflect DPR variable naming change

* title addition to passage use-cases handled

* modified handling untitled batch

* resolved mypy errors

* typos in docstrings and comments fixed

* cleaned DPR code and added new test cases

* warnings added for non-bert model [SEP] token removal

* changed warning to logger warning

* title mask creation refactored

* bug fix on cuda issues

* tutorial 6 instantiates modified DPR

* tutorial 5 modified

* tutorial 5 ipython notebook modified: DPR instantiation

* batch_size added to DPR instantiation

* tutorial 5 jupyter notebook typos fixed

* improved docstrings, fixed typos

* Update docstring

Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-25 20:16:00 +05:30
venuraja79
ea334658d6
DPR (Dense Retriever) for InMemoryDocumentStore #316 (#332) 2020-08-24 14:48:36 +02:00
Tanay Soni
3a42eb663e
Include InMemoryDocumetStore for DPR test 2020-08-24 14:44:12 +02:00
antoniolanza1996
85c743fe2a
Add refresh_type arg to ElasticsearchDocumentStore (#326)
* Added refresh type into ElasticsearchDocumentStore

* Update docstring

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-21 09:36:04 +02:00
Tanay Soni
7d2a8f19fc
Improve speed for SQLDocumentStore (#330) 2020-08-21 09:24:49 +02:00
Branden Chan
a54d6a5bd7
Make Tutorials Work on Colab GPUs (#322)
* Add pip install torch+cu
2020-08-19 14:52:50 +02:00
bogdankostic
f388ca025c
Aggregate multiple no answers in MultiLabel (#324)
* Aggregate multiple no answers

* Add test for multiple no answers
2020-08-18 18:25:01 +02:00
bogdankostic
3a95fe2006
Align TransformersReader with FARMReader (#319)
* Align TransformersReader with FARMReader
2020-08-18 14:26:33 +02:00
bogdankostic
72b1013560
Restructure update embeddings (#304)
* Restructure update embeddings

* Adapt FAISSDocStore

* Adapt test and tutorial

Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-18 14:04:31 +02:00
brandenchan
8a3eca05c3 Change to retriever eval top_k to match notebook 2020-08-18 11:39:49 +02:00
bogdankostic
b30963d0cd
Add Tests for MultiLabel (#318)
* Add tests for MultiLabel

* Add test for no_answer and is_correct_answer=False + fix bug in MultiLabel aggregation

* Fix bug in MultiLabel aggregation
2020-08-17 20:14:31 +02:00
Tanay Soni
01ff66dfd6 Remove redundant test fixture 2020-08-17 14:19:38 +02:00
Tanay Soni
200bb4bafd
Refactor the DPR tutorial to use FAISS (#317) 2020-08-17 13:30:02 +02:00
Dany
f0222ecd27 Add Tika in CI 2020-08-17 11:35:33 +02:00
Dany
9d02b3ff9d Add requirement for Tika 2020-08-17 11:32:29 +02:00
Dany
ac2fe58b84 Add Tika Converter file (#314) 2020-08-17 11:26:56 +02:00
Dany
403318b1f5 Add Tika Converter (#314) 2020-08-17 11:21:09 +02:00
Tanay Soni
1637ce1184 Revert "Add Tika Converter (#314)"
This reverts commit 5ef59b1901da6d51bfa085683321a243228d4fc9.
2020-08-17 11:13:52 +02:00
Tanay Soni
5ef59b1901
Add Tika Converter (#314) 2020-08-14 14:13:59 +02:00
Timo Moeller
eb0fc6439c Remove inconsistent underscore in test_filename 2020-08-13 14:53:43 +02:00
Tanay Soni
089fecf99e
Fix indexing of metadata for FAISS/SQL Document Store (#310) 2020-08-13 12:25:32 +02:00
Tanay Soni
397dcf9d92
Ensure exact match when filtering by meta in Elasticsearch (#311) 2020-08-13 11:42:49 +02:00