haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-22 15:38:01 +00:00

Author	SHA1	Message	Date
Lalit Pagaria	75d0ebd076	Add Summarizer (standalone + node in custom pipelines + SearchSummarizationPipeline) (#698 ) * Integration of SummarizationQAPipeline with Haystack. * Moving summarizer tests because of OOM issue * Fixing typo * Splitting summarizer test in separate ci step * Removing sysctl configuration as we already running elastic search in docker container * fixing mypy issue * update parameter names and docstrings * update parameter names in BaseSummarizer * rename pipeline * change return type of summarizer from answer to document * change scope of doc store fixture * revert scope * temp. disable test_faiss_index_save_and_load() * fix mypy. change order for mypy in CI Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-08 14:29:46 +01:00
Lalit Pagaria	3a9a756810	Using Columns names instead of ORM to get all documents (#620 ) * Using Columns name instead of ORM object for get all documents call * Separating meta search from documents. This way it will optimize the memory not duplicating document.text * Fixing mypy issue * SQLite have limit on number of host variable hence using batching to fetch meta information * Query meta only if meta field is not Null in DocOrm * Add batch_size to other functions except label * meta can be none so fix that issue * Dummy commit to trigger CI * Using chunked dictionary * Upgrading faiss * reverting change related to faiss upgrade * Changing DB name in test_faiss_retrieving test as it might interfere with exiting files by corrupting DB file * Updating doc string related to batch_size * Update docstring for batch_size Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2021-01-06 15:56:19 +01:00
Tanay Soni	33fe597949	Cleanup Pytest Fixtures (#639 )	2020-12-14 18:15:44 +01:00
Tanay Soni	8e52b48e1d	Add pipelines for GenerativeQA & FAQs (#645 )	2020-12-03 10:27:06 +01:00
Tanay Soni	ea976ba5b5	Add return_embedding parameter for get_all_documents() (#615 )	2020-11-26 10:32:30 +01:00
Lalit Pagaria	f13443054a	[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484 ) * Adding dummy generator implementation * Adding tutorial to try the model * Committing current non working code * Committing current update where we need to call generate function directly and need to convert embedding to tensor way * Addressing review comments. * Refactoring finder, and implementing rag_generator class. * Refined the implementation of RAGGenerator and now it is in clean shape * Renaming RAGGenerator to RAGenerator * Reverting change from finder.py and addressing review comments * Remove support for RagSequenceForGeneration * Utilizing embed_passage function from DensePassageRetriever * Adding sample test data to verify generator output * Updating testing script * Updating testing script * Fixing bug related to top_k * Updating latest farm dependency * Comment out farm dependency * Reverting changes from TransformersReader * Adding transformers dataset to compare transformers and haystack generator implementation * Using generator_encoder instead of question_encoder to generate context_input_ids * Adding workaround to install FARM dependency from master branch * Removing unnecessary changes * Fixing generator test * Removing transformers datasets * Fixing generator test * Some cleanup and updating TODO comments * Adding tutorial notebook * Updating tutorials with comments * Explicitly passing token model in RAG test * Addressing review comments * Fixing notebook * Refactoring tests to reduce memory footprint * Split generator tests in separate ci step and before running it reclaim memory by terminating containers * Moving tika dependent test to separate dir * Remove unwanted code * Brining reader under session scope * Farm is now session object hence restoring changes from default value * Updating assert for pdf converter * Dummy commit to trigger CI flow * REducing memory footprint required for generator tests * Fixing mypy issues * Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits * reducing changes * Fixing CI * changing elastic search ci * Fixing test error * Disabling return of embedding * Marking generator test as well * Refactoring tutorials * Increasing ES memory to 750M * Trying another fix for ES CI * Reverting CI changes * Splitting tests in CI * Generator and non-generator markers split * Adding pytest.ini to add markers and enable strict-markers option * Reducing elastic search container memory * Simplifying generator test by using documents with embedding directly * Bump up farm to 0.5.0	2020-10-30 18:06:02 +01:00
Lalit Pagaria	abda994116	Pytest fix memory leak and put pytest marker on slow tests (#520 ) * Clear faiss_index during teardown * Marking slow test with pytest markers. So In future these test can be optimized. Also command line option can be added to skip them refer https://pytest.org/en/stable/example/simple.html#control-skipping-of-tests-according-to-command-line-option * Fixing test	2020-10-26 19:19:10 +01:00
Lalit Pagaria	2e9f3c1512	Fix update_embeddings function in FAISSDocumentStore and add retriever fixture in tests (#481 ) * 1. Prevent update_embeddings function in FAISSDocumentStore to set faiss_index as None when document store does not have any docs. 2. cleaning up tests by adding fixture for retriever. * TfidfRetriever need document store with documents during initialization as it call fit() function in constructor so fixing it by checking self.paragraphs of None * Fix naming of retriever's fixture (embedded to embedding and tfid to tfidf)	2020-10-14 16:15:04 +02:00
Malte Pietsch	8edeb844f7	Remove phi normalization from FAISS, support more index types, 3x speedup (#467 ) * remove phi normalization * add special case for hnsw * rename vector_size to vector_dim * fix loading. fix extra dim in tests * switch to new ES syntax for vector similarity * 3x sql speed up. cascade deletes. add train_index() * add docstrings. remove vector_dim from load() * delete docs from faiss and sql * fix delete of docs in test * relax type hint for faiss index * rename metric to metric_type Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>	2020-10-06 16:09:56 +02:00
Lalit Pagaria	465ccbc12e	Allow multiple write calls to existing FAISS index. (#422 ) - Fixing issue when update_embeddings always create new FAISS index instead of clearing existing one. New index creation may not free existing used memory and cause memory leak. Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-05 12:01:20 +02:00
Malte Pietsch	db6864d159	Fix type casting for vectors in FAISS (#399 ) * Fix type casting for vectors in FAISS Co-authored-by: philipp-bode <philipp.bode@student.hpi.de> * add type casts for elastic. refactor embedding retriever tests * fix case: empty embedding field * fix faiss tolerance * add assert in test_faiss_retrieving Co-authored-by: philipp-bode <philipp.bode@student.hpi.de>	2020-09-18 17:08:13 +02:00
Malte Pietsch	d69133966d	Fix faiss test tolerance	2020-09-18 13:57:29 +02:00
Malte Pietsch	4c503158a7	Fix duplicate vector ids in FAISS (#395 ) * fix duplicate vector ids in faiss * Add test Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com> * revert score change * switch to faiss_index.ntotal for ids. add tests Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>	2020-09-18 12:52:22 +02:00
Tanay Soni	9d0df60aad	Add FAISS Document Store (#253 )	2020-08-07 14:25:08 +02:00

14 Commits