haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-12-19 19:19:00 +00:00

Author	SHA1	Message	Date
kolk	72b637ae6d	DensePassageRetriever: Add Training, Refactor Inference to FARM modules (#527 ) * dpr training and inference code refactored with FARM modules * dpr test cases modified * docstring and default arguments updated * dpr training docstring updated * bugfix in dense retriever inference, DPR tutorials modified * Bump FARM to 0.5.0 * update README for DPR * dpr training and inference code refactored with FARM modules * dpr test cases modified * docstring and default arguments updated * dpr training docstring updated * bugfix in dense retriever inference, DPR tutorials modified * Bump FARM to 0.5.0 * update README for DPR * mypy errors fix * DPR instantiation bugfix * Fix DPR init in RAG Tutorial Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-30 19:22:06 +01:00
Lalit Pagaria	f13443054a	[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484 ) * Adding dummy generator implementation * Adding tutorial to try the model * Committing current non working code * Committing current update where we need to call generate function directly and need to convert embedding to tensor way * Addressing review comments. * Refactoring finder, and implementing rag_generator class. * Refined the implementation of RAGGenerator and now it is in clean shape * Renaming RAGGenerator to RAGenerator * Reverting change from finder.py and addressing review comments * Remove support for RagSequenceForGeneration * Utilizing embed_passage function from DensePassageRetriever * Adding sample test data to verify generator output * Updating testing script * Updating testing script * Fixing bug related to top_k * Updating latest farm dependency * Comment out farm dependency * Reverting changes from TransformersReader * Adding transformers dataset to compare transformers and haystack generator implementation * Using generator_encoder instead of question_encoder to generate context_input_ids * Adding workaround to install FARM dependency from master branch * Removing unnecessary changes * Fixing generator test * Removing transformers datasets * Fixing generator test * Some cleanup and updating TODO comments * Adding tutorial notebook * Updating tutorials with comments * Explicitly passing token model in RAG test * Addressing review comments * Fixing notebook * Refactoring tests to reduce memory footprint * Split generator tests in separate ci step and before running it reclaim memory by terminating containers * Moving tika dependent test to separate dir * Remove unwanted code * Brining reader under session scope * Farm is now session object hence restoring changes from default value * Updating assert for pdf converter * Dummy commit to trigger CI flow * REducing memory footprint required for generator tests * Fixing mypy issues * Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits * reducing changes * Fixing CI * changing elastic search ci * Fixing test error * Disabling return of embedding * Marking generator test as well * Refactoring tutorials * Increasing ES memory to 750M * Trying another fix for ES CI * Reverting CI changes * Splitting tests in CI * Generator and non-generator markers split * Adding pytest.ini to add markers and enable strict-markers option * Reducing elastic search container memory * Simplifying generator test by using documents with embedding directly * Bump up farm to 0.5.0	2020-10-30 18:06:02 +01:00
Tanay Soni	db4151bbc0	Fix scoring in Elasticsearch for dot product (#517 )	2020-10-23 17:50:49 +02:00
bogdankostic	f62117c232	Add urllib version requirement to colab notebooks (#509 )	2020-10-23 10:43:58 +02:00
Lalit Pagaria	63c12371b9	Change arg "model" to "model_name_or_path" in TransformersReader (#510 ) * Consistent parameter naming for TransformersReader along with removing unused imports as well. * Addressing review comments	2020-10-21 17:15:35 +02:00
Malte Pietsch	bdbd1b323b	Add create_index and similarity metric to api config (#493 ) * make creation of label index optional * add params for rest api * reset tutorial flag	2020-10-15 18:41:36 +02:00
Guillim	fb5db59590	Remove useless line from Tutorial4_FAQ_style_QA (#416 ) * Update Tutorial4_FAQ_style_QA.py Used to be useful when `.apply()` was necessary, but not any longer * Update Tutorial4_FAQ_style_QA.ipynb	2020-09-22 09:01:04 +02:00
Malte Pietsch	747e0c0046	Bump FARM to 0.4.9. Remove custom torch installation from colab tutorials (#404 )	2020-09-21 10:26:12 +02:00
Malte Pietsch	271ff30262	fix type casting of embeddings for tutorial 4 (#402 )	2020-09-18 18:10:50 +02:00
Branden Chan	7fdb85d63a	Create documentation website (#272 ) * Skeleton of doc website * Flesh out documentation pages * Split concepts into their own rst files * add tutorial rsts * Consistent level 1 markdown headers in tutorials * Change theme to readthedocs * Turn bullet points into prose * Populate sections * Add more text * Add more sphinx files * Add more retriever documentation * combined all documenations in one structure * rename of src to _src as it was ignored by git * Incorporate MP2's changes * add benchmark bar charts * Adapt docstrings in Readers * Improvements to intro, creation of glossary * Adapt docstrings in Retrievers * Adapt docstrings in Finder * Adapt Docstrings of Finder * Updates to text * Edit text * update doc strings * proof read tutorials * Edit text * Edit text * Add stacked chart * populate graph with data * Switch Documentation to markdown (#386) * add way to generate markdown files to sphinx * changed from rst to markdown and extended sphinx for it * fix spelling * Clean titles * delete file * change spelling * add sections to document store usage * add basic rest api docs * fix readme in setup.py * Update Tutorials * Change section names * add windows note to pip install * update intro * new renderer for markdown files * Fix typos * delete dpr_utils.py * fix windows note in get started * Fix docstrings * deleted rest api docs in api * fixed typo * Fix docstring * revert readme to rst * Fix readme * Update setup.py Co-authored-by: deepset <deepset@Crenolape.localdomain> Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com> Co-authored-by: Bogdan Kostić <bogdankostic@web.de> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-09-18 12:57:32 +02:00
Malte Pietsch	9727829cc6	Rename and restructure modules (database, indexing, schemas) (#379 ) * rename database to documentstore * move document, label, multilabel to haystack/schema.py * rename documentstore -> document_store * split indexing modules -> file_converter + preprocessor * fix order of imports * Update tutorial notebooks * fix torch version in tutorial 4	2020-09-16 18:33:23 +02:00
Malte Pietsch	bde33ddaaa	Bump FARM version to 0.4.8 and PyTorch >=1.5.1, <= 1.6.0 (#376 ) * bump farm version to 0.4.8 * move back to original transformers pipeline * remove dpr_utils and use transformers implementation * update tutorial notebooks	2020-09-16 17:24:40 +02:00
brandenchan	b44b1ac6ec	Set top_k_per_candidate	2020-08-26 12:03:56 +02:00
kolk	f2b6cc761b	Refactor DPR from FB to Transformers codebase (#308 ) * change_HFBertEncoder to transformers DPREncoder * Removed BertTensorizer * model download relative path * Refactor model load * Tutorial5 DPR updated * fix print_eval_results typo * copy transformers DPR modules in dpr_utils and test * transformer v3.0.2 import errors fixed * remove dependency of DPRConfig on attribute use_return_tuple * Adjust transformers 302 locally to work with dpr * projection layer removed from DPR encoders * fixed mypy errors * transformers DPR compatible code added * transformers DPR compatibility added * bug fix in tutorial 6 notebook * Docstring update and variable naming issues fix * tutorial modified to reflect DPR variable naming change * title addition to passage use-cases handled * modified handling untitled batch * resolved mypy errors * typos in docstrings and comments fixed * cleaned DPR code and added new test cases * warnings added for non-bert model [SEP] token removal * changed warning to logger warning * title mask creation refactored * bug fix on cuda issues * tutorial 6 instantiates modified DPR * tutorial 5 modified * tutorial 5 ipython notebook modified: DPR instantiation * batch_size added to DPR instantiation * tutorial 5 jupyter notebook typos fixed * improved docstrings, fixed typos * Update docstring Co-authored-by: Timo Moeller <timo.moeller@deepset.ai> Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-08-25 20:16:00 +05:30
Branden Chan	a54d6a5bd7	Make Tutorials Work on Colab GPUs (#322 ) * Add pip install torch+cu	2020-08-19 14:52:50 +02:00
bogdankostic	72b1013560	Restructure update embeddings (#304 ) * Restructure update embeddings * Adapt FAISSDocStore * Adapt test and tutorial Co-authored-by: Timo Moeller <timo.moeller@deepset.ai>	2020-08-18 14:04:31 +02:00
brandenchan	8a3eca05c3	Change to retriever eval top_k to match notebook	2020-08-18 11:39:49 +02:00
Tanay Soni	200bb4bafd	Refactor the DPR tutorial to use FAISS (#317 )	2020-08-17 13:30:02 +02:00
Timo Moeller	72e6867278	Aggregate label objects for same questions (#292 ) * Add aggregate labels obj, use in retriever eval function * Change launch ES param * Move aggregation from ES document store to base class * Fix type annotations	2020-08-07 11:24:41 +02:00
Malte Pietsch	29a15c0d59	Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243 )	2020-07-31 11:34:06 +02:00
Malte Pietsch	5b1be233d0	Update Tutorial 4	2020-07-17 19:31:00 +02:00
Malte Pietsch	6bed2f509f	Refactor DPR for latest transformers version & change init arg `gpu` -> `use_gpu` for DPR and EmbeddingRetriever (#239 ) * fix tokenizer warning in latest transformers * change dpr arg from gpu to use_gpu * change gpu arg for EmbeddingRetriever	2020-07-16 10:45:01 +02:00
Malte Pietsch	c9d3146fae	Fix multi-gpu training via DataParallel (#234 )	2020-07-15 18:34:55 +02:00
Branden Chan	36867dabac	change from top_n_recall to accuracy	2020-07-15 17:05:08 +02:00
Branden Chan	64721d3196	One more update	2020-07-15 16:24:10 +02:00
Branden Chan	c55477e0ce	update eval dataset	2020-07-15 16:14:52 +02:00
Malte Pietsch	99a6a34047	Upgrade to new FARM / Transformers / PyTorch versions (#212 )	2020-07-14 18:53:15 +02:00
Tanay Soni	4c21556a79	Fix embedding method for Retriever (#220 )	2020-07-13 12:38:01 +02:00
Malte Pietsch	fe33a481ad	Update tutorials (#200 ) * fix link in readme. update installation in tutorials * update haystack version to latest master * add basic documentation for input to write_documents() * add docstring for sqldocumentstore * comment out docker in notebook	2020-07-07 14:59:01 +02:00
Malte Pietsch	c36f8c991e	Update Tutorial 6	2020-07-03 16:06:46 +02:00
Malte Pietsch	8a9f97fad3	Tutorial for Dense Passage Retriever (#186 )	2020-07-03 15:53:58 +02:00
Malte Pietsch	07ecfb60b9	Dense Passage Retriever (Inference) (#167 )	2020-06-30 19:05:45 +02:00
Timo Moeller	c53aaddb78	Fix document id missing in farm inference output (#174 )	2020-06-26 11:01:10 +02:00
Tanay Soni	44f89c94ab	Upgrade FARM version (#172 )	2020-06-24 15:14:09 +02:00
Yaser Martinez Palenzuela	97bbb4280c	Correct field in evaluation tutorial (#139 )	2020-06-08 16:38:09 +02:00
Tanay Soni	71e15a5a11	Update Haystack version in tutorials (#136 )	2020-06-08 11:31:12 +02:00
Tanay Soni	ef9e4f4467	Add PDF text extraction (#109 )	2020-06-08 11:07:19 +02:00
bogdankostic	479fcb1ace	Fix evaluation (#132 ) * Fix bugs in Tutorial 5 * Adapt tutorials to new metrics	2020-06-05 18:33:50 +02:00
bogdankostic	bbfccf5cf6	Add Evaluation of Reader, Retriever and Finder (#92 )	2020-05-29 15:57:07 +02:00
Branden Chan	5c68a5d755	Move save_dir from FARMReader() to reader.train()	2020-05-26 12:14:35 +02:00
Branden Chan	cbe62044b1	Update colab link	2020-05-26 11:56:24 +02:00
Malte Pietsch	c468200a19	Split docs into passages in Tutorial	2020-05-21 13:01:48 +02:00
Malte Pietsch	d5443b36ec	Split docs into passages in Tutorial	2020-05-21 13:01:04 +02:00
Malte Pietsch	a431a94b04	Add basic tutorial for FAQ-based QA & batch comp. of embeddings (#98 ) * Add basic tutorial for FAQ-based QA and switch to bach computation of embeddings * update readme & haystack version in tutorial	2020-05-07 10:19:26 +02:00
Malte Pietsch	f58f58fc86	Make saving more explicit in tutorial 2 (#95 )	2020-05-06 12:13:49 +02:00
Malte Pietsch	d595886630	split docs into passages in tutorials	2020-04-30 19:27:15 +02:00
Malte Pietsch	7b01fb3fbc	Merge branch 'master' of github.com:deepset-ai/haystack	2020-04-30 19:03:44 +02:00
Malte Pietsch	7972038afc	update tutorials	2020-04-30 19:00:41 +02:00
Malte Pietsch	438543a18a	pin haystack version in tutorials until release (#87 )	2020-04-30 18:44:44 +02:00
Tanay Soni	887bdcc376	Update tutorials to use Elasticsearch, new Retrievers (#79 )	2020-04-29 14:01:05 +02:00

1 2

67 Commits