* Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores
* Unit test cases updated to use `embedding_dim` instead of `vector_dim`
* Unit test case update to use embedding_dim instead of vector_dim
* Add latest docstring and tutorial changes
* Put usage of `vector_dim` param in same if-block as corresponding warning
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* set fixture scope to "function"
* run FARMReader without multiprocessing
* dispose off ray after tests
* run most expensive tasks first in test files
* run expensive tests first
* run garbage collector between tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make Tutorial10 use print instead of logs and fix a typo in Tutoria15
* Add a type check in 'print_answers'
* Add same checks to print_documents and print_questions
* Make RAGenerator return Answers instead of dictionaries
* Fix RAGenerator tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Feat: Removing use of temp file while downloading archive from url along with adding CI for windows and mac platform
* Windows CI by default installing pytorch gpu hence updating CI to pick cpu version
* fixing mac cache build issue
* updating windows pip install command for torch
* another attempt
* updating ci
* Adding sudo
* fixing ls failure on windows
* another attempt to fix build issue
* Saving env variable of test files
* Adding debug log
* Github action differ on windows
* adding debug
* anohter attempt
* Windows have different ways to receive env
* fixing template
* minor fx
* Adding debug
* Removing use of json
* Adding back fromJson
* addin toJson
* removing print
* anohter attempt
* disabling parallel run at least for testing
* installing docker for mac runner
* correcting docker install command
* Linux dockers are not suported in windows
* Removing mac changes
* Upgrading pytorch
* using lts pytorch
* Separating win and ubuntu
* Install java 11
* enabling linux container env
* docker cli command
* docker cli command
* start elastic service
* List all service
* correcting service name
* Attempt to fix multiple test run
* convert to json
* another attempt to check
* Updating build cache step
* attempt
* Add tika
* Separating windows CI
* Changing CI name
* Skipping test which does not work in windows
* Skipping tests for windows
* create cleanup function in conftest
* adding skipif marker on tests
* Run windows PR on only push to master
* Addressing review comments
* Enabling windows ci for this PR
* Tika init is being called when importing tika function
* handling tika import issue
* handling tika import issue in test
* Fixing import issue
* removing tika fixure
* Removing fixture from tests
* Disable windows ci on pull request
* Add back extra pytorch install step
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Files moved, imports all broken
* Fix most imports and docstrings into
* Fix the paths to the modules in the API docs
* Add latest docstring and tutorial changes
* Add a few pipelines that were lost in the inports
* Fix a bunch of mypy warnings
* Add latest docstring and tutorial changes
* Create a file_classifier module
* Add docs for file_classifier
* Fixed most circular imports, now the REST API can start
* Add latest docstring and tutorial changes
* Tackling more mypy issues
* Reintroduce from FARM and fix last mypy issues hopefully
* Re-enable old-style imports
* Fix some more import from the top-level package in an attempt to sort out circular imports
* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests
* Change document_store into document_stores
* Update imports in tutorials
* Add latest docstring and tutorial changes
* Probably fixes summarizer tests
* Improve the old-style import allowing module imports (should work)
* Try to fix the docs
* Remove dedicated KnowledgeGraph page from autodocs
* Remove dedicated GraphRetriever page from autodocs
* Fix generate_docstrings.sh with an updated list of yaml files to look for
* Fix some more modules in the docs
* Fix the document stores docs too
* Fix a small issue on Tutorial14
* Add latest docstring and tutorial changes
* Add deprecation warning to old-style imports
* Remove stray folder and import Dict into dense.py
* Change import path for MLFlowLogger
* Add old loggers path to the import path aliases
* Fix debug output of convert_ipynb.py
* Fix circular import on BaseRetriever
* Missed one merge block
* re-run tutorial 5
* Fix imports in tutorial 5
* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base
* Add latest docstring and tutorial changes
* Fix typo in utils __init__
* Fix a few more imports
* Fix benchmarks too
* New-style imports in test_knowledge_graph
* Rollback setup.py
* Rollback squad_to_dpr too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add node names validation
* Add tests
* Improve test and test that params exists before validating
* Fix the REST API
* Use minilm-uncased-squad2 instead of roberta-base-squad2
* Use roberta model for test_pipeline.yaml
* Turn off TOKENIZERS_PARALLELISM in generator tests (#1605)
* Account for non-targeted parameters
* Restore previous parameters handling in the rest api
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* simplify tests for individual doc stores
* WIP refactoring markers of tests
* test alternative approach for tests with existing parametrization
* fix skip logic of already parametrized tests
* fix weaviate behaviour in tests - not parametrizing it in our general test cases.
* Add latest docstring and tutorial changes
* fix some tests
* remove sql from document_store_types
* fix markers for generator and pipeline test
* remove inmemory marker
* remove unneeded elasticsearch markers
* update readme and contributing.md
* update contributing
* adjust example
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Adding translator with many generic input parameter support
* Making dict_key as generic
* Fixing mypy issue
* Adding pipeline and using opus models
* Add latest docstring and tutorial changes
* Adding test cases for end-to-end translation for generator, summerizer etc
* raise error join and merge nodes
* Fix test failure
* add docstrings. add usage documentation. rm skip_special_tokens param
* Add latest docstring and tutorial changes
* fix code snippets in md
* Adding few extra configuration parameters and fixing tests
* Fixingmypy issue and updating usage document
* fix for mypy issue in pipeline.py
* reverting renaming of pytest_collection_modifyitems method
* Addressing review comments
* setting skip_special_tokens to True
* removing model_max_length argument as None type is not supported to many models
* Removing padding parameter. Better to leave it as default otherwise it cause tensor size miss match error. If this option required by used then it can be added later.
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Adding dummy generator implementation
* Adding tutorial to try the model
* Committing current non working code
* Committing current update where we need to call generate function directly and need to convert embedding to tensor way
* Addressing review comments.
* Refactoring finder, and implementing rag_generator class.
* Refined the implementation of RAGGenerator and now it is in clean shape
* Renaming RAGGenerator to RAGenerator
* Reverting change from finder.py and addressing review comments
* Remove support for RagSequenceForGeneration
* Utilizing embed_passage function from DensePassageRetriever
* Adding sample test data to verify generator output
* Updating testing script
* Updating testing script
* Fixing bug related to top_k
* Updating latest farm dependency
* Comment out farm dependency
* Reverting changes from TransformersReader
* Adding transformers dataset to compare transformers and haystack generator implementation
* Using generator_encoder instead of question_encoder to generate context_input_ids
* Adding workaround to install FARM dependency from master branch
* Removing unnecessary changes
* Fixing generator test
* Removing transformers datasets
* Fixing generator test
* Some cleanup and updating TODO comments
* Adding tutorial notebook
* Updating tutorials with comments
* Explicitly passing token model in RAG test
* Addressing review comments
* Fixing notebook
* Refactoring tests to reduce memory footprint
* Split generator tests in separate ci step and before running it reclaim memory by terminating containers
* Moving tika dependent test to separate dir
* Remove unwanted code
* Brining reader under session scope
* Farm is now session object hence restoring changes from default value
* Updating assert for pdf converter
* Dummy commit to trigger CI flow
* REducing memory footprint required for generator tests
* Fixing mypy issues
* Marking test with tika and elasticsearch markers. Reverting changes in CI and pytest splits
* reducing changes
* Fixing CI
* changing elastic search ci
* Fixing test error
* Disabling return of embedding
* Marking generator test as well
* Refactoring tutorials
* Increasing ES memory to 750M
* Trying another fix for ES CI
* Reverting CI changes
* Splitting tests in CI
* Generator and non-generator markers split
* Adding pytest.ini to add markers and enable strict-markers option
* Reducing elastic search container memory
* Simplifying generator test by using documents with embedding directly
* Bump up farm to 0.5.0