11 Commits

Author SHA1 Message Date
Sara Zan
6354528336
Add /documents/get_by_filters endpoint (#1580)
* Add endpoint to get documents by filter

* Add test for /documents/get_by_filter and extend the delete documents test

* Add rest_api/file-upload to .gitignore

* Make sure the document store is empty for each test

* Improve docstrings of delete_documents_by_filters and get_documents_by_filters

Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2021-10-12 10:53:54 +02:00
Sara Zan
3539e6b041
Fix circular import in the REST API (#1556)
* Fix circular import in the REST API

* remove unneeded import in test

Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
2021-10-04 21:18:23 +02:00
Sara Zan
af4a44fcbd
WIP Add rest api endpoint to delete documents by filter (#1546)
* Add rest api endpoint to delete documents by filter.

* Remove parametrization of rest api tests

* Make the paths in rest_api/config.py absolute

* Fix path to pipelines.yaml

* Restructuring test_rest_api.py to be able to test only my endpoint (and to make the suite more structured)

* Convert DELETE /documents into POST /documents/delete_by_filters

Co-authored by:  sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com>
2021-10-04 11:21:00 +02:00
oryx1729
9dd7c74f4f
Refactor communication between Pipeline Components (#1321) 2021-09-10 11:41:16 +02:00
Ikram Ali
29e140196b
[pipeline] Allow for batch indexing when using Pipelines fix #1168 (#1231)
* [pipeline] Allow for batch indexing when using Pipelines fix #1168

* [pipeline] Test case fixed fix #1168

* [file_converter] Path.suffix updated #1168

* [file_converter] meta can be one of these three cases:
                 A single dict that is applied to all files
                 One dict for each file being converted
                 None #1168

* [file_converter] mypy error fixed.

* [file_converter] mypy error fixed.

* [rest_api] batch file upload introduced in indexing API.

* [test_case] Test_api file upload parameter name updated.

* [ui] Streamlit file upload parameter updated.
2021-06-30 14:13:46 +02:00
oryx1729
8c68699e1c
Refactor REST APIs to use Pipelines (#922) 2021-04-07 17:53:32 +02:00
Tanay Soni
f95b70df38
Fix file upload API (#808) 2021-02-05 12:17:38 +01:00
Tanay Soni
acd088808b
Allow list of filter values in REST API (#568) 2020-11-09 20:41:53 +01:00
Lalit Pagaria
f13443054a
[RAG] Integrate "Retrieval-Augmented Generation" with Haystack (#484)
* Adding dummy generator implementation

* Adding tutorial to try the model

* Committing current non working code

* Committing current update where we need to call generate function directly and need to convert embedding to tensor way

* Addressing review comments.

* Refactoring finder, and implementing rag_generator class.

* Refined the implementation of RAGGenerator and now it is in clean shape

* Renaming RAGGenerator to RAGenerator

* Reverting change from finder.py and addressing review comments

* Remove support for RagSequenceForGeneration

* Utilizing embed_passage function from DensePassageRetriever

* Adding sample test data to verify generator output

* Updating testing script

* Updating testing script

* Fixing bug related to top_k

* Updating latest farm dependency

* Comment out farm dependency

* Reverting changes from TransformersReader

* Adding transformers dataset to compare transformers and haystack generator implementation

* Using generator_encoder instead of question_encoder to generate context_input_ids

* Adding workaround to install FARM dependency from master branch

* Removing unnecessary changes

* Fixing generator test

* Removing transformers datasets

* Fixing generator test

* Some cleanup and updating TODO comments

* Adding tutorial notebook

* Updating tutorials with comments

* Explicitly passing token model in RAG test

* Addressing review comments

* Fixing notebook

* Refactoring tests to reduce memory footprint

* Split generator tests in separate ci step and before running it reclaim memory by terminating containers

* Moving tika dependent test to separate dir

* Remove unwanted code

* Brining reader under session scope

* Farm is now session object hence restoring changes from default value

* Updating assert for pdf converter

* Dummy commit to trigger CI flow

* REducing memory footprint required for generator tests

* Fixing mypy issues

* Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits

* reducing changes

* Fixing CI

* changing elastic search ci

* Fixing test error

* Disabling return of embedding

* Marking generator test as well

* Refactoring tutorials

* Increasing ES memory to 750M

* Trying another fix for ES CI

* Reverting CI changes

* Splitting tests in CI

* Generator and non-generator markers split

* Adding pytest.ini to add markers and enable strict-markers option

* Reducing elastic search container memory

* Simplifying generator test by using documents with embedding directly

* Bump up farm to 0.5.0
2020-10-30 18:06:02 +01:00
Lalit Pagaria
abda994116
Pytest fix memory leak and put pytest marker on slow tests (#520)
* Clear faiss_index during teardown

* Marking slow test with pytest markers. So In future these test can be optimized. Also command line option can be added to skip them refer https://pytest.org/en/stable/example/simple.html#control-skipping-of-tests-according-to-command-line-option

* Fixing test
2020-10-26 19:19:10 +01:00
Lalit Pagaria
b9da789475
Add Elasticsearch Query DSL compliant Query API (#471) 2020-10-16 13:25:31 +02:00