haystack

mirror of https://github.com/deepset-ai/haystack.git synced 2025-08-11 01:57:48 +00:00

Author	SHA1	Message	Date
Lalit Pagaria	f13443054a	[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484 ) * Adding dummy generator implementation * Adding tutorial to try the model * Committing current non working code * Committing current update where we need to call generate function directly and need to convert embedding to tensor way * Addressing review comments. * Refactoring finder, and implementing rag_generator class. * Refined the implementation of RAGGenerator and now it is in clean shape * Renaming RAGGenerator to RAGenerator * Reverting change from finder.py and addressing review comments * Remove support for RagSequenceForGeneration * Utilizing embed_passage function from DensePassageRetriever * Adding sample test data to verify generator output * Updating testing script * Updating testing script * Fixing bug related to top_k * Updating latest farm dependency * Comment out farm dependency * Reverting changes from TransformersReader * Adding transformers dataset to compare transformers and haystack generator implementation * Using generator_encoder instead of question_encoder to generate context_input_ids * Adding workaround to install FARM dependency from master branch * Removing unnecessary changes * Fixing generator test * Removing transformers datasets * Fixing generator test * Some cleanup and updating TODO comments * Adding tutorial notebook * Updating tutorials with comments * Explicitly passing token model in RAG test * Addressing review comments * Fixing notebook * Refactoring tests to reduce memory footprint * Split generator tests in separate ci step and before running it reclaim memory by terminating containers * Moving tika dependent test to separate dir * Remove unwanted code * Brining reader under session scope * Farm is now session object hence restoring changes from default value * Updating assert for pdf converter * Dummy commit to trigger CI flow * REducing memory footprint required for generator tests * Fixing mypy issues * Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits * reducing changes * Fixing CI * changing elastic search ci * Fixing test error * Disabling return of embedding * Marking generator test as well * Refactoring tutorials * Increasing ES memory to 750M * Trying another fix for ES CI * Reverting CI changes * Splitting tests in CI * Generator and non-generator markers split * Adding pytest.ini to add markers and enable strict-markers option * Reducing elastic search container memory * Simplifying generator test by using documents with embedding directly * Bump up farm to 0.5.0	2020-10-30 18:06:02 +01:00
Tanay Soni	974b37eded	Add PreProcessor to simplify splitting and cleaning of docs (#473 ) * Add PreProcessing * Adjust PDF conversion tests * Add tests for Preprocessing * Add requirement * Fix tests * Ignore decoding errors for TextConverter * Rename split_size to split_length * Adjust tests Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>	2020-10-15 10:42:08 +02:00
Tanay Soni	3399fc784d	Refactor file converter interface (#393 )	2020-09-18 10:42:13 +02:00
Malte Pietsch	9727829cc6	Rename and restructure modules (database, indexing, schemas) (#379 ) * rename database to documentstore * move document, label, multilabel to haystack/schema.py * rename documentstore -> document_store * split indexing modules -> file_converter + preprocessor * fix order of imports * Update tutorial notebooks * fix torch version in tutorial 4	2020-09-16 18:33:23 +02:00
Dany	403318b1f5	Add Tika Converter (#314 )	2020-08-17 11:21:09 +02:00
Tanay Soni	1637ce1184	Revert "Add Tika Converter (#314 )" This reverts commit 5ef59b1901da6d51bfa085683321a243228d4fc9.	2020-08-17 11:13:52 +02:00
Tanay Soni	5ef59b1901	Add Tika Converter (#314 )	2020-08-14 14:13:59 +02:00
Tanay Soni	a349eef0db	Add API endpoint to upload files (#154 )	2020-06-17 16:28:26 +02:00
Tanay Soni	ef9e4f4467	Add PDF text extraction (#109 )	2020-06-08 11:07:19 +02:00

9 Commits