111 Commits

Author SHA1 Message Date
Guillim
fb5db59590
Remove useless line from Tutorial4_FAQ_style_QA (#416)
* Update Tutorial4_FAQ_style_QA.py

Used to be useful when `.apply()` was necessary, but not any longer

* Update Tutorial4_FAQ_style_QA.ipynb
2020-09-22 09:01:04 +02:00
Malte Pietsch
747e0c0046
Bump FARM to 0.4.9. Remove custom torch installation from colab tutorials (#404) 2020-09-21 10:26:12 +02:00
Malte Pietsch
271ff30262
fix type casting of embeddings for tutorial 4 (#402) 2020-09-18 18:10:50 +02:00
Branden Chan
7fdb85d63a
Create documentation website (#272)
* Skeleton of doc website

* Flesh out documentation pages

* Split concepts into their own rst files

* add tutorial rsts

* Consistent level 1 markdown headers in tutorials

* Change theme to readthedocs

* Turn bullet points into prose

* Populate sections

* Add more text

* Add more sphinx files

* Add more retriever documentation

* combined all documenations in one structure

* rename of src to _src as it was ignored by git

* Incorporate MP2's changes

* add benchmark bar charts

* Adapt docstrings in Readers

* Improvements to intro, creation of glossary

* Adapt docstrings in Retrievers

* Adapt docstrings in Finder

* Adapt Docstrings of Finder

* Updates to text

* Edit text

* update doc strings

* proof read tutorials

* Edit text

* Edit text

* Add stacked chart

* populate graph with data

* Switch Documentation to markdown (#386)

* add way to generate markdown files to sphinx

* changed from rst to markdown and extended sphinx for it

* fix spelling

* Clean titles

* delete file

* change spelling

* add sections to document store usage

* add basic rest api docs

* fix readme in setup.py

* Update Tutorials

* Change section names

* add windows note to pip install

* update intro

* new renderer for markdown files

* Fix typos

* delete dpr_utils.py

* fix windows note in get started

* Fix docstrings

* deleted rest api docs in api

* fixed typo

* Fix docstring

* revert readme to rst

* Fix readme

* Update setup.py

Co-authored-by: deepset <deepset@Crenolape.localdomain>
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
Co-authored-by: Bogdan Kostić <bogdankostic@web.de>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-09-18 12:57:32 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) (#379)
* rename database to documentstore

* move document, label, multilabel to haystack/schema.py

* rename documentstore -> document_store

* split indexing modules -> file_converter + preprocessor

* fix order of imports

* Update tutorial notebooks

* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
Malte Pietsch
bde33ddaaa
Bump FARM version to 0.4.8 and PyTorch >=1.5.1, <= 1.6.0 (#376)
* bump farm version to 0.4.8

* move back to original transformers pipeline

* remove dpr_utils and use transformers implementation

* update tutorial notebooks
2020-09-16 17:24:40 +02:00
brandenchan
b44b1ac6ec Set top_k_per_candidate 2020-08-26 12:03:56 +02:00
kolk
f2b6cc761b
Refactor DPR from FB to Transformers codebase (#308)
* change_HFBertEncoder to transformers DPREncoder

* Removed BertTensorizer

* model download relative path

* Refactor model load

* Tutorial5 DPR updated

* fix print_eval_results typo

* copy transformers DPR modules in dpr_utils and test

* transformer v3.0.2 import errors fixed

* remove dependency of DPRConfig on attribute use_return_tuple

* Adjust transformers 302 locally to work with dpr

* projection layer removed from DPR encoders

* fixed mypy errors

* transformers DPR compatible code added

* transformers DPR compatibility added

* bug fix in tutorial 6 notebook

* Docstring update and variable naming issues fix

* tutorial modified to reflect DPR variable naming change

* title addition to passage use-cases handled

* modified handling untitled batch

* resolved mypy errors

* typos in docstrings and comments fixed

* cleaned DPR code and added new test cases

* warnings added for non-bert model [SEP] token removal

* changed warning to logger warning

* title mask creation refactored

* bug fix on cuda issues

* tutorial 6 instantiates modified DPR

* tutorial 5 modified

* tutorial 5 ipython notebook modified: DPR instantiation

* batch_size added to DPR instantiation

* tutorial 5 jupyter notebook typos fixed

* improved docstrings, fixed typos

* Update docstring

Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2020-08-25 20:16:00 +05:30
Branden Chan
a54d6a5bd7
Make Tutorials Work on Colab GPUs (#322)
* Add pip install torch+cu
2020-08-19 14:52:50 +02:00
bogdankostic
72b1013560
Restructure update embeddings (#304)
* Restructure update embeddings

* Adapt FAISSDocStore

* Adapt test and tutorial

Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>
2020-08-18 14:04:31 +02:00
brandenchan
8a3eca05c3 Change to retriever eval top_k to match notebook 2020-08-18 11:39:49 +02:00
Tanay Soni
200bb4bafd
Refactor the DPR tutorial to use FAISS (#317) 2020-08-17 13:30:02 +02:00
Timo Moeller
72e6867278
Aggregate label objects for same questions (#292)
* Add aggregate labels obj, use in retriever eval function

* Change launch ES param

* Move aggregation from ES document store to base class

* Fix type annotations
2020-08-07 11:24:41 +02:00
Malte Pietsch
29a15c0d59
Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243) 2020-07-31 11:34:06 +02:00
Malte Pietsch
5b1be233d0 Update Tutorial 4 2020-07-17 19:31:00 +02:00
Malte Pietsch
6bed2f509f
Refactor DPR for latest transformers version & change init arg gpu -> use_gpu for DPR and EmbeddingRetriever (#239)
* fix tokenizer warning in latest transformers

* change dpr arg from gpu to use_gpu

* change gpu arg for EmbeddingRetriever
2020-07-16 10:45:01 +02:00
Malte Pietsch
c9d3146fae
Fix multi-gpu training via DataParallel (#234) 2020-07-15 18:34:55 +02:00
Branden Chan
36867dabac change from top_n_recall to accuracy 2020-07-15 17:05:08 +02:00
Branden Chan
64721d3196 One more update 2020-07-15 16:24:10 +02:00
Branden Chan
c55477e0ce update eval dataset 2020-07-15 16:14:52 +02:00
Malte Pietsch
99a6a34047
Upgrade to new FARM / Transformers / PyTorch versions (#212) 2020-07-14 18:53:15 +02:00
Tanay Soni
4c21556a79
Fix embedding method for Retriever (#220) 2020-07-13 12:38:01 +02:00
Malte Pietsch
fe33a481ad
Update tutorials (#200)
* fix link in readme. update installation in tutorials

* update haystack version to latest master

* add basic documentation for input to write_documents()

* add docstring for sqldocumentstore

* comment out docker in notebook
2020-07-07 14:59:01 +02:00
Malte Pietsch
c36f8c991e Update Tutorial 6 2020-07-03 16:06:46 +02:00
Malte Pietsch
8a9f97fad3
Tutorial for Dense Passage Retriever (#186) 2020-07-03 15:53:58 +02:00
Malte Pietsch
07ecfb60b9
Dense Passage Retriever (Inference) (#167) 2020-06-30 19:05:45 +02:00
Timo Moeller
c53aaddb78
Fix document id missing in farm inference output (#174) 2020-06-26 11:01:10 +02:00
Tanay Soni
44f89c94ab
Upgrade FARM version (#172) 2020-06-24 15:14:09 +02:00
Yaser Martinez Palenzuela
97bbb4280c
Correct field in evaluation tutorial (#139) 2020-06-08 16:38:09 +02:00
Tanay Soni
71e15a5a11
Update Haystack version in tutorials (#136) 2020-06-08 11:31:12 +02:00
Tanay Soni
ef9e4f4467
Add PDF text extraction (#109) 2020-06-08 11:07:19 +02:00
bogdankostic
479fcb1ace
Fix evaluation (#132)
* Fix bugs in Tutorial 5

* Adapt tutorials to new metrics
2020-06-05 18:33:50 +02:00
bogdankostic
bbfccf5cf6
Add Evaluation of Reader, Retriever and Finder (#92) 2020-05-29 15:57:07 +02:00
Branden Chan
5c68a5d755
Move save_dir from FARMReader() to reader.train() 2020-05-26 12:14:35 +02:00
Branden Chan
cbe62044b1
Update colab link 2020-05-26 11:56:24 +02:00
Malte Pietsch
c468200a19
Split docs into passages in Tutorial 2020-05-21 13:01:48 +02:00
Malte Pietsch
d5443b36ec
Split docs into passages in Tutorial 2020-05-21 13:01:04 +02:00
Malte Pietsch
a431a94b04
Add basic tutorial for FAQ-based QA & batch comp. of embeddings (#98)
* Add basic tutorial for FAQ-based QA and switch to bach computation of embeddings

* update readme & haystack version in tutorial
2020-05-07 10:19:26 +02:00
Malte Pietsch
f58f58fc86
Make saving more explicit in tutorial 2 (#95) 2020-05-06 12:13:49 +02:00
Malte Pietsch
d595886630 split docs into passages in tutorials 2020-04-30 19:27:15 +02:00
Malte Pietsch
7b01fb3fbc Merge branch 'master' of github.com:deepset-ai/haystack 2020-04-30 19:03:44 +02:00
Malte Pietsch
7972038afc update tutorials 2020-04-30 19:00:41 +02:00
Malte Pietsch
438543a18a
pin haystack version in tutorials until release (#87) 2020-04-30 18:44:44 +02:00
Tanay Soni
887bdcc376
Update tutorials to use Elasticsearch, new Retrievers (#79) 2020-04-29 14:01:05 +02:00
Branden Chan
420e11695b Remove use_gpu param 2020-03-24 17:47:00 +01:00
bogdankostic
0048ee9c5c
Added Jupyter notebooks of Tutorials (#43)
Add Jupyter and Colab notebooks of tutorials
2020-03-17 19:58:53 +01:00
timoeller
f681026a56 Simplify no ans handling, disable no ans + sorting in private function 2020-02-24 16:15:06 +01:00
timoeller
ef9b99c3cc Add no answer handling and sort no answer into positive predictions 2020-02-21 18:27:53 +01:00
timoeller
840b368732 Add no ans example 2020-02-19 14:51:12 +01:00
timoeller
c6d9da8827 Add doc for no answer boosting 2020-02-19 13:02:51 +01:00