* Adding translator with many generic input parameter support
* Making dict_key as generic
* Fixing mypy issue
* Adding pipeline and using opus models
* Add latest docstring and tutorial changes
* Adding test cases for end-to-end translation for generator, summerizer etc
* raise error join and merge nodes
* Fix test failure
* add docstrings. add usage documentation. rm skip_special_tokens param
* Add latest docstring and tutorial changes
* fix code snippets in md
* Adding few extra configuration parameters and fixing tests
* Fixingmypy issue and updating usage document
* fix for mypy issue in pipeline.py
* reverting renaming of pytest_collection_modifyitems method
* Addressing review comments
* setting skip_special_tokens to True
* removing model_max_length argument as None type is not supported to many models
* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Integration of SummarizationQAPipeline with Haystack.
* Moving summarizer tests because of OOM issue
* Fixing typo
* Splitting summarizer test in separate ci step
* Removing sysctl configuration as we already running elastic search in docker container
* fixing mypy issue
* update parameter names and docstrings
* update parameter names in BaseSummarizer
* rename pipeline
* change return type of summarizer from answer to document
* change scope of doc store fixture
* revert scope
* temp. disable test_faiss_index_save_and_load()
* fix mypy. change order for mypy in CI
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Adding dummy generator implementation
* Adding tutorial to try the model
* Committing current non working code
* Committing current update where we need to call generate function directly and need to convert embedding to tensor way
* Addressing review comments.
* Refactoring finder, and implementing rag_generator class.
* Refined the implementation of RAGGenerator and now it is in clean shape
* Renaming RAGGenerator to RAGenerator
* Reverting change from finder.py and addressing review comments
* Remove support for RagSequenceForGeneration
* Utilizing embed_passage function from DensePassageRetriever
* Adding sample test data to verify generator output
* Updating testing script
* Updating testing script
* Fixing bug related to top_k
* Updating latest farm dependency
* Comment out farm dependency
* Reverting changes from TransformersReader
* Adding transformers dataset to compare transformers and haystack generator implementation
* Using generator_encoder instead of question_encoder to generate context_input_ids
* Adding workaround to install FARM dependency from master branch
* Removing unnecessary changes
* Fixing generator test
* Removing transformers datasets
* Fixing generator test
* Some cleanup and updating TODO comments
* Adding tutorial notebook
* Updating tutorials with comments
* Explicitly passing token model in RAG test
* Addressing review comments
* Fixing notebook
* Refactoring tests to reduce memory footprint
* Split generator tests in separate ci step and before running it reclaim memory by terminating containers
* Moving tika dependent test to separate dir
* Remove unwanted code
* Brining reader under session scope
* Farm is now session object hence restoring changes from default value
* Updating assert for pdf converter
* Dummy commit to trigger CI flow
* REducing memory footprint required for generator tests
* Fixing mypy issues
* Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits
* reducing changes
* Fixing CI
* changing elastic search ci
* Fixing test error
* Disabling return of embedding
* Marking generator test as well
* Refactoring tutorials
* Increasing ES memory to 750M
* Trying another fix for ES CI
* Reverting CI changes
* Splitting tests in CI
* Generator and non-generator markers split
* Adding pytest.ini to add markers and enable strict-markers option
* Reducing elastic search container memory
* Simplifying generator test by using documents with embedding directly
* Bump up farm to 0.5.0
* Adding support to return embedding along with other result via query_by_embedding function
* Adding test case to check return embedding
* By default for all tests but DPR tests: disable return_embedding flag
* Reducing None test case and fixing query_by_embedding of ElasticsearchDocumentStore when it updating self.excluded_meta_data directly
* Fixing mypy reported issue
* 1. Prevent update_embeddings function in FAISSDocumentStore to set faiss_index as None when document store does not have any docs.
2. cleaning up tests by adding fixture for retriever.
* TfidfRetriever need document store with documents during initialization as it call fit() function in constructor so fixing it by checking self.paragraphs of None
* Fix naming of retriever's fixture (embedded to embedding and tfid to tfidf)
* remove phi normalization
* add special case for hnsw
* rename vector_size to vector_dim
* fix loading. fix extra dim in tests
* switch to new ES syntax for vector similarity
* 3x sql speed up. cascade deletes. add train_index()
* add docstrings. remove vector_dim from load()
* delete docs from faiss and sql
* fix delete of docs in test
* relax type hint for faiss index
* rename metric to metric_type
Co-authored-by: lalitpagaria <19303690+lalitpagaria@users.noreply.github.com>