Malte Pietsch
db6864d159
Fix type casting for vectors in FAISS ( #399 )
...
* Fix type casting for vectors in FAISS
Co-authored-by: philipp-bode <philipp.bode@student.hpi.de>
* add type casts for elastic. refactor embedding retriever tests
* fix case: empty embedding field
* fix faiss tolerance
* add assert in test_faiss_retrieving
Co-authored-by: philipp-bode <philipp.bode@student.hpi.de>
2020-09-18 17:08:13 +02:00
Branden Chan
4ea4cfd282
Merge pull request #400 from deepset-ai/fix_imgs
...
Fix images in readme
2020-09-18 15:01:20 +02:00
brandenchan
f4a1682570
Fix images
2020-09-18 14:58:03 +02:00
Malte Pietsch
d69133966d
Fix faiss test tolerance
2020-09-18 13:57:29 +02:00
Branden Chan
7fdb85d63a
Create documentation website ( #272 )
...
* Skeleton of doc website
* Flesh out documentation pages
* Split concepts into their own rst files
* add tutorial rsts
* Consistent level 1 markdown headers in tutorials
* Change theme to readthedocs
* Turn bullet points into prose
* Populate sections
* Add more text
* Add more sphinx files
* Add more retriever documentation
* combined all documenations in one structure
* rename of src to _src as it was ignored by git
* Incorporate MP2's changes
* add benchmark bar charts
* Adapt docstrings in Readers
* Improvements to intro, creation of glossary
* Adapt docstrings in Retrievers
* Adapt docstrings in Finder
* Adapt Docstrings of Finder
* Updates to text
* Edit text
* update doc strings
* proof read tutorials
* Edit text
* Edit text
* Add stacked chart
* populate graph with data
* Switch Documentation to markdown (#386 )
* add way to generate markdown files to sphinx
* changed from rst to markdown and extended sphinx for it
* fix spelling
* Clean titles
* delete file
* change spelling
* add sections to document store usage
* add basic rest api docs
* fix readme in setup.py
* Update Tutorials
* Change section names
* add windows note to pip install
* update intro
* new renderer for markdown files
* Fix typos
* delete dpr_utils.py
* fix windows note in get started
* Fix docstrings
* deleted rest api docs in api
* fixed typo
* Fix docstring
* revert readme to rst
* Fix readme
* Update setup.py
Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-09-18 12:57:32 +02:00
Malte Pietsch
4c503158a7
Fix duplicate vector ids in FAISS ( #395 )
...
* fix duplicate vector ids in faiss
* Add test
Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>
* revert score change
* switch to faiss_index.ntotal for ids. add tests
Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>
2020-09-18 12:52:22 +02:00
Tanay Soni
0859da8f74
Fix document filtering in SQLDocumentStore ( #396 )
2020-09-18 12:22:52 +02:00
Tanay Soni
3399fc784d
Refactor file converter interface ( #393 )
2020-09-18 10:42:13 +02:00
Malte Pietsch
4e46d9d176
remove dpr_utils.py
2020-09-17 17:17:19 +02:00
Tanay Soni
06243dbda4
Move retriever probability calculations to document_store ( #389 )
2020-09-17 16:25:46 +02:00
Tanay Soni
03fa4a8740
Exclude embedding fields from the REST API ( #390 )
2020-09-17 14:37:01 +02:00
Malte Pietsch
3782646948
Add logo to readme ( #384 )
...
* add logo image
* add logo to readme
* change img path to master
* Update README.rst
2020-09-16 18:36:22 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) ( #379 )
...
* rename database to documentstore
* move document, label, multilabel to haystack/schema.py
* rename documentstore -> document_store
* split indexing modules -> file_converter + preprocessor
* fix order of imports
* Update tutorial notebooks
* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
Malte Pietsch
bde33ddaaa
Bump FARM version to 0.4.8 and PyTorch >=1.5.1, <= 1.6.0 ( #376 )
...
* bump farm version to 0.4.8
* move back to original transformers pipeline
* remove dpr_utils and use transformers implementation
* update tutorial notebooks
2020-09-16 17:24:40 +02:00
Lalit P
de5ad42e46
Adjust tests for MacOS ( #374 )
2020-09-15 15:04:46 +02:00
Tanay Soni
c0c2865e58
Add FAISS query scores ( #368 )
2020-09-11 13:59:38 +02:00
Tanay Soni
9d93ffbe54
Add Gunicorn timeout ( #364 )
2020-09-10 09:20:39 +02:00
maxupp
06e8be30ea
Add index arg to Finder.get_answers() and _via_similar_questions() ( #362 )
...
Co-authored-by: Max Uppenkamp <max.uppenkamp@inform-software.com>
2020-09-09 12:39:13 +02:00
Malte Pietsch
b1cdc68d6c
Update README.rst
2020-09-09 11:47:17 +02:00
Malte Pietsch
d821e8d260
Bump FARM version to 0.4.7 ( #340 )
2020-09-04 17:29:14 +02:00
Tanay Soni
26e4e7ad7a
Use port 8000 in docs ( #357 )
2020-09-04 09:54:24 +02:00
Malte Pietsch
a634eee40c
Update README.rst
2020-08-28 13:35:11 +02:00
Malte Pietsch
686e55b77c
Create CONTRIBUTING.md
2020-08-28 13:20:19 +02:00
Tanay Soni
f0fe16774f
Raise exception if filter supplied for query ( #338 )
2020-08-27 10:05:10 +02:00
Branden Chan
0ad22d5038
Merge pull request #331 from deepset-ai/robust_eval
...
More robust Reader eval by limiting max answers and creating no answer labels
2020-08-26 13:28:10 +02:00
brandenchan
f108939fc3
Change warning to info
2020-08-26 13:27:30 +02:00
brandenchan
b44b1ac6ec
Set top_k_per_candidate
2020-08-26 12:03:56 +02:00
brandenchan
cca8676f90
More robust eval
2020-08-26 12:01:59 +02:00
kolk
f2b6cc761b
Refactor DPR from FB to Transformers codebase ( #308 )
...
* change_HFBertEncoder to transformers DPREncoder
* Removed BertTensorizer
* model download relative path
* Refactor model load
* Tutorial5 DPR updated
* fix print_eval_results typo
* copy transformers DPR modules in dpr_utils and test
* transformer v3.0.2 import errors fixed
* remove dependency of DPRConfig on attribute use_return_tuple
* Adjust transformers 302 locally to work with dpr
* projection layer removed from DPR encoders
* fixed mypy errors
* transformers DPR compatible code added
* transformers DPR compatibility added
* bug fix in tutorial 6 notebook
* Docstring update and variable naming issues fix
* tutorial modified to reflect DPR variable naming change
* title addition to passage use-cases handled
* modified handling untitled batch
* resolved mypy errors
* typos in docstrings and comments fixed
* cleaned DPR code and added new test cases
* warnings added for non-bert model [SEP] token removal
* changed warning to logger warning
* title mask creation refactored
* bug fix on cuda issues
* tutorial 6 instantiates modified DPR
* tutorial 5 modified
* tutorial 5 ipython notebook modified: DPR instantiation
* batch_size added to DPR instantiation
* tutorial 5 jupyter notebook typos fixed
* improved docstrings, fixed typos
* Update docstring
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-25 20:16:00 +05:30
venuraja79
ea334658d6
DPR (Dense Retriever) for InMemoryDocumentStore #316 ( #332 )
2020-08-24 14:48:36 +02:00
Tanay Soni
3a42eb663e
Include InMemoryDocumetStore for DPR test
2020-08-24 14:44:12 +02:00
antoniolanza1996
85c743fe2a
Add refresh_type arg to ElasticsearchDocumentStore ( #326 )
...
* Added refresh type into ElasticsearchDocumentStore
* Update docstring
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-21 09:36:04 +02:00
Tanay Soni
7d2a8f19fc
Improve speed for SQLDocumentStore ( #330 )
2020-08-21 09:24:49 +02:00
Branden Chan
a54d6a5bd7
Make Tutorials Work on Colab GPUs ( #322 )
...
* Add pip install torch+cu
2020-08-19 14:52:50 +02:00
bogdankostic
f388ca025c
Aggregate multiple no answers in MultiLabel ( #324 )
...
* Aggregate multiple no answers
* Add test for multiple no answers
2020-08-18 18:25:01 +02:00
bogdankostic
3a95fe2006
Align TransformersReader with FARMReader ( #319 )
...
* Align TransformersReader with FARMReader
2020-08-18 14:26:33 +02:00
bogdankostic
72b1013560
Restructure update embeddings ( #304 )
...
* Restructure update embeddings
* Adapt FAISSDocStore
* Adapt test and tutorial
Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-18 14:04:31 +02:00
brandenchan
8a3eca05c3
Change to retriever eval top_k to match notebook
2020-08-18 11:39:49 +02:00
bogdankostic
b30963d0cd
Add Tests for MultiLabel ( #318 )
...
* Add tests for MultiLabel
* Add test for no_answer and is_correct_answer=False + fix bug in MultiLabel aggregation
* Fix bug in MultiLabel aggregation
2020-08-17 20:14:31 +02:00
Tanay Soni
01ff66dfd6
Remove redundant test fixture
2020-08-17 14:19:38 +02:00
Tanay Soni
200bb4bafd
Refactor the DPR tutorial to use FAISS ( #317 )
2020-08-17 13:30:02 +02:00
Dany
f0222ecd27
Add Tika in CI
2020-08-17 11:35:33 +02:00
Dany
9d02b3ff9d
Add requirement for Tika
2020-08-17 11:32:29 +02:00
Dany
ac2fe58b84
Add Tika Converter file ( #314 )
2020-08-17 11:26:56 +02:00
Dany
403318b1f5
Add Tika Converter ( #314 )
2020-08-17 11:21:09 +02:00
Tanay Soni
1637ce1184
Revert "Add Tika Converter ( #314 )"
...
This reverts commit 5ef59b1901da6d51bfa085683321a243228d4fc9.
2020-08-17 11:13:52 +02:00
Tanay Soni
5ef59b1901
Add Tika Converter ( #314 )
2020-08-14 14:13:59 +02:00
Timo Moeller
eb0fc6439c
Remove inconsistent underscore in test_filename
2020-08-13 14:53:43 +02:00
Tanay Soni
089fecf99e
Fix indexing of metadata for FAISS/SQL Document Store ( #310 )
2020-08-13 12:25:32 +02:00
Tanay Soni
397dcf9d92
Ensure exact match when filtering by meta in Elasticsearch ( #311 )
2020-08-13 11:42:49 +02:00