28 Commits

Author SHA1 Message Date
Lalit Pagaria
3f81c93f36
Add document update for SQL and FAISS Document Store (#584) 2020-11-16 16:08:13 +01:00
Tanay Soni
3e095ddd7d
Add filters for delete_all_documents() (#591) 2020-11-16 14:15:32 +01:00
Lalit Pagaria
f13443054a
[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484)
* Adding dummy generator implementation

* Adding tutorial to try the model

* Committing current non working code

* Committing current update where we need to call generate function directly and need to convert embedding to tensor way

* Addressing review comments.

* Refactoring finder, and implementing rag_generator class.

* Refined the implementation of RAGGenerator and now it is in clean shape

* Renaming RAGGenerator to RAGenerator

* Reverting change from finder.py and addressing review comments

* Remove support for RagSequenceForGeneration

* Utilizing embed_passage function from DensePassageRetriever

* Adding sample test data to verify generator output

* Updating testing script

* Updating testing script

* Fixing bug related to top_k

* Updating latest farm dependency

* Comment out farm dependency

* Reverting changes from TransformersReader

* Adding transformers dataset to compare transformers and haystack generator implementation

* Using generator_encoder instead of question_encoder to generate context_input_ids

* Adding workaround to install FARM dependency from master branch

* Removing unnecessary changes

* Fixing generator test

* Removing transformers datasets

* Fixing generator test

* Some cleanup and updating TODO comments

* Adding tutorial notebook

* Updating tutorials with comments

* Explicitly passing token model in RAG test

* Addressing review comments

* Fixing notebook

* Refactoring tests to reduce memory footprint

* Split generator tests in separate ci step and before running it reclaim memory by terminating containers

* Moving tika dependent test to separate dir

* Remove unwanted code

* Brining reader under session scope

* Farm is now session object hence restoring changes from default value

* Updating assert for pdf converter

* Dummy commit to trigger CI flow

* REducing memory footprint required for generator tests

* Fixing mypy issues

* Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits

* reducing changes

* Fixing CI

* changing elastic search ci

* Fixing test error

* Disabling return of embedding

* Marking generator test as well

* Refactoring tutorials

* Increasing ES memory to 750M

* Trying another fix for ES CI

* Reverting CI changes

* Splitting tests in CI

* Generator and non-generator markers split

* Adding pytest.ini to add markers and enable strict-markers option

* Reducing elastic search container memory

* Simplifying generator test by using documents with embedding directly

* Bump up farm to 0.5.0
2020-10-30 18:06:02 +01:00
Tanay Soni
3bec264d76
Add filters for document count (#512) 2020-10-22 12:42:13 +02:00
Tanay Soni
669c72d538
Enable bulk operations on vector IDs for FAISSDocumentStore (#460) 2020-10-02 14:43:25 +02:00
Malte Pietsch
271ff30262
fix type casting of embeddings for tutorial 4 (#402) 2020-09-18 18:10:50 +02:00
Tanay Soni
0859da8f74
Fix document filtering in SQLDocumentStore (#396) 2020-09-18 12:22:52 +02:00
Malte Pietsch
9727829cc6
Rename and restructure modules (database, indexing, schemas) (#379)
* rename database to documentstore

* move document, label, multilabel to haystack/schema.py

* rename documentstore -> document_store

* split indexing modules -> file_converter + preprocessor

* fix order of imports

* Update tutorial notebooks

* fix torch version in tutorial 4
2020-09-16 18:33:23 +02:00
bogdankostic
f388ca025c
Aggregate multiple no answers in MultiLabel (#324)
* Aggregate multiple no answers

* Add test for multiple no answers
2020-08-18 18:25:01 +02:00
bogdankostic
b30963d0cd
Add Tests for MultiLabel (#318)
* Add tests for MultiLabel

* Add test for no_answer and is_correct_answer=False + fix bug in MultiLabel aggregation

* Fix bug in MultiLabel aggregation
2020-08-17 20:14:31 +02:00
Tanay Soni
089fecf99e
Fix indexing of metadata for FAISS/SQL Document Store (#310) 2020-08-13 12:25:32 +02:00
Karim Jana
c7078a36c0
Custom fields for indexing in ElasticsearchDocumentStore (#297) 2020-08-10 11:34:39 +02:00
Tanay Soni
9d0df60aad
Add FAISS Document Store (#253) 2020-08-07 14:25:08 +02:00
Tanay Soni
5937f9cf16
Deprecate Tags for Document Stores (#286) 2020-08-04 14:24:12 +02:00
Tanay Soni
d90435efd6 Add wait for Elasticsearch update call 2020-07-31 12:06:27 +02:00
Tanay Soni
5210c8c2ab
Add method to update meta fields for documents in Elasticsearch (#242) 2020-07-16 15:34:55 +02:00
Tanay Soni
b886e054a3
Move document_name attribute to meta (#217) 2020-07-14 09:53:31 +02:00
Tanay Soni
ef9e4f4467
Add PDF text extraction (#109) 2020-06-08 11:07:19 +02:00
Stan Kirdey
bf8e506c45
Add embedding query for InMemoryDocumentStore 2020-05-18 14:47:41 +02:00
Stan Kirdey
72a3b70d7a
Add filtering by tags for InMemoryDocumentStore (#108) 2020-05-14 22:12:25 +02:00
Tanay Soni
37e0ff70f7
Add test for Elasticsearch document store (#88) 2020-05-04 18:00:07 +02:00
Tanay Soni
8e736cefa0
Simplify Retriever query (#73) 2020-04-27 12:19:59 +02:00
Tanay Soni
f83a164095
Add Elasticsearch Document Store (#13) 2020-01-24 18:24:07 +01:00
Malte Pietsch
3ccd42f981 fix test 2020-01-23 15:25:42 +01:00
Malte Pietsch
8a48cd7dd6 fix test 2020-01-23 09:18:15 +01:00
Tanay Soni
845062ce2d Fix tests 2020-01-22 16:08:52 +01:00
Tanay Soni
d2c77f3077 Fix test 2019-11-27 19:34:10 +01:00
Malte Pietsch
7400abe327 add test 2019-11-27 17:53:42 +01:00