haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-07-23 08:52:16 +00:00

Author	SHA1	Message	Date
Malte Pietsch	a0921f0c35	Remove `Finder` (#1326 ) * deprecate finder * remove import * add doc section for moving from finder to pipelines	2021-08-09 13:41:40 +02:00
Branden Chan	aa6f768efa	Prevent merge of same questions on different documents during evaluation (#1119 ) * Fix duplicate question in Reader.eval() * Add duplicate question support in document store * Support duplicate questions in retriever eval * Update tutorial * Rename key_tuple * Change error message * Add warning when more than 6 labels * Allow for label grouping options * Add support for aggregating by label meta * Satisfy mypy * Fix duplicate question in Reader.eval() * Add duplicate question support in document store * Support duplicate questions in retriever eval * Update tutorial * Rename key_tuple * Change error message * Add warning when more than 6 labels * Allow for label grouping options * Add support for aggregating by label meta * Satisfy mypy * Make label field flexible, add docstrings * Satisfy mypy * Fix failing tests * Adjust docstring * Fix tutorial Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-06-02 12:09:03 +02:00
Julian Risch	84c34295a1	Re-ranking component for document search without QA (#1025 ) * Adding ranker similar to retriever and reader * Sort documents according to query-document similarity scores * Reranking and model training runs for small example * Added EvalRanker node * Calculate recall@k in EvalRetriever and EvalRanker nodes * Renaming EvalRetriever to EvalDocuments and EvalReader to EvalAnswers * Added mean reciprocal rank as metric for EvalDocuments * Fix bug that appeared when ranking documents with same score * Remove commented code for unimplmented eval() of Ranker node * Add documentation of k parameter in EvalDocuments * Add Ranker docu and renaming top_k param	2021-05-31 15:31:36 +02:00
Julian Risch	bf4563e5d2	Filtering duplicate answers (#1021 ) * Allow filtering of duplicate answers as implemented in FARM * Changed default behavior to filtering exact duplicates * Change expected test result due to filtering of duplicate answers by default * Rounding expected test results for comparison with predictions	2021-05-03 17:18:10 +02:00
Branden Chan	d77152c469	WIP: Add evaluation nodes for Pipelines (#904 ) * Add main eval fns * WIP: make pipeline_eval.py run * Fix typo * Add support for no_answers * Add latest docstring and tutorial changes * Working pipeline eval * Add timing of nodes * Add latest docstring and tutorial changes * Refactor and clean * Update tutorial script * Set default params * Update tutorials * Fix indent * Add latest docstring and tutorial changes * Address mypy issues * Add test * Fix mypy error * Clear outputs * Add doc strings * Incorporate reviewer feedback * Add latest docstring and tutorial changes * Revert query counting * Fix typo Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2021-04-01 17:35:18 +02:00
Branden Chan	f3a3b73d9b	Choose correct similarity fns during benchmark runs & re-run benchmarks (#773 ) * Adapt to new dataset_from_dicts return signature * rename fn * Align similarity fn in benchmark doc store * Better choice of similarity fn * Increase postgres wait time * Add more expected returned variables * update benchmark results * Fix typo * update all benchmark runs * multiply stats by 100 * Specify similarity fns for website Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-02-03 11:45:18 +01:00
Tanay Soni	337376c81d	Add `batch_size` and generators to document stores. (#733 ) * Add batch update of embeddings in document stores * Resolve merge conflict * Remove document ordering dependency in tests * Adjust index buffer size for tests * Adjust ES Scroll Slice * Use generator for document store pagination * Add pagination for InMemoryDocumentStore * Fix missing index parameter in FAISS update_embeddings() * Fix FAISS update_embeddings() * Update FAISS tests * Update eval tests * Revert code formatting change * Fix document count in FAISS update embeddings * Fix vector_ids reset in SQLDocumentStore * Update doctrings * Update docstring	2021-01-21 16:00:08 +01:00
Timo Moeller	4803da009a	Using PreProcessor functions on eval data (#751 ) * Add eval data splitting * Adjust for split by passage, add test and test data, adjust docstrings, add max_docs to highler level fct	2021-01-20 14:40:10 +01:00
bogdankostic	7709b6cee0	Make batchwise adding of evaluation data possible (#717 ) * Make batchwise adding of evaluation data possible * Fix typos in docstrings * Merge add_eval_data and add_eval_data_batchwise * Improve import statements * Move add_eval_data to BaseDocumentStore * Add batch_size param to write_documents and write_labels in EsDocStore * Adjust docstring Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-12 17:54:43 +01:00
bogdankostic	ffaa0249f7	Fix retriever evaluation metrics (#547 ) * Add mean reciprocal rank and fix mean average precision * Add mrr metric to docstring * Fix mypy error	2020-11-05 13:34:47 +01:00
Lalit Pagaria	f13443054a	[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484 ) * Adding dummy generator implementation * Adding tutorial to try the model * Committing current non working code * Committing current update where we need to call generate function directly and need to convert embedding to tensor way * Addressing review comments. * Refactoring finder, and implementing rag_generator class. * Refined the implementation of RAGGenerator and now it is in clean shape * Renaming RAGGenerator to RAGenerator * Reverting change from finder.py and addressing review comments * Remove support for RagSequenceForGeneration * Utilizing embed_passage function from DensePassageRetriever * Adding sample test data to verify generator output * Updating testing script * Updating testing script * Fixing bug related to top_k * Updating latest farm dependency * Comment out farm dependency * Reverting changes from TransformersReader * Adding transformers dataset to compare transformers and haystack generator implementation * Using generator_encoder instead of question_encoder to generate context_input_ids * Adding workaround to install FARM dependency from master branch * Removing unnecessary changes * Fixing generator test * Removing transformers datasets * Fixing generator test * Some cleanup and updating TODO comments * Adding tutorial notebook * Updating tutorials with comments * Explicitly passing token model in RAG test * Addressing review comments * Fixing notebook * Refactoring tests to reduce memory footprint * Split generator tests in separate ci step and before running it reclaim memory by terminating containers * Moving tika dependent test to separate dir * Remove unwanted code * Brining reader under session scope * Farm is now session object hence restoring changes from default value * Updating assert for pdf converter * Dummy commit to trigger CI flow * REducing memory footprint required for generator tests * Fixing mypy issues * Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits * reducing changes * Fixing CI * changing elastic search ci * Fixing test error * Disabling return of embedding * Marking generator test as well * Refactoring tutorials * Increasing ES memory to 750M * Trying another fix for ES CI * Reverting CI changes * Splitting tests in CI * Generator and non-generator markers split * Adding pytest.ini to add markers and enable strict-markers option * Reducing elastic search container memory * Simplifying generator test by using documents with embedding directly * Bump up farm to 0.5.0	2020-10-30 18:06:02 +01:00
Tanay Soni	db4151bbc0	Fix scoring in Elasticsearch for dot product (#517 )	2020-10-23 17:50:49 +02:00
Lalit Pagaria	2e9f3c1512	Fix update_embeddings function in FAISSDocumentStore and add retriever fixture in tests (#481 ) * 1. Prevent update_embeddings function in FAISSDocumentStore to set faiss_index as None when document store does not have any docs. 2. cleaning up tests by adding fixture for retriever. * TfidfRetriever need document store with documents during initialization as it call fit() function in constructor so fixing it by checking self.paragraphs of None * Fix naming of retriever's fixture (embedded to embedding and tfid to tfidf)	2020-10-14 16:15:04 +02:00
Malte Pietsch	9727829cc6	Rename and restructure modules (database, indexing, schemas) (#379 ) * rename database to documentstore * move document, label, multilabel to haystack/schema.py * rename documentstore -> document_store * split indexing modules -> file_converter + preprocessor * fix order of imports * Update tutorial notebooks * fix torch version in tutorial 4	2020-09-16 18:33:23 +02:00
brandenchan	cca8676f90	More robust eval	2020-08-26 12:01:59 +02:00
bogdankostic	5186d2d235	Batch prediction in evaluation (#137 ) * Add Batch evaluation * Separate evaluation methods * Clean calculation of eval metrics * Adapt eval to Label objects * Fix format of no_answer * Adapt to MultiLabel * Add tests	2020-08-10 19:30:31 +02:00
Malte Pietsch	29a15c0d59	Add eval for Dense Passage Retriever & Refactor handling of labels/feedback (#243 )	2020-07-31 11:34:06 +02:00

17 Commits