* Refactored code to unify vector_dim and embedding_dim parameter in DocumentStores
* Unit test cases updated to use `embedding_dim` instead of `vector_dim`
* Unit test case update to use embedding_dim instead of vector_dim
* Add latest docstring and tutorial changes
* Put usage of `vector_dim` param in same if-block as corresponding warning
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* change column order for evaluatation dataframe
* added missing eval column node_input
* generic order for both document and answer returning nodes; ensure no columns get lost
Co-authored-by: tstadel <60758086+tstadel@users.noreply.github.com>
* check multiprocessing sharing strategy is available
* Change default of multiprocessing strategy to None
* Change default sharing strategy to None in retriever
* Add latest docstring and tutorial changes
* Make logging message easier to understand
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add tinybert data augmentation
* don't reload glove in tinybert data augmentation
* fix unnecessary load_glove call
* fix type hints
* add comments and type hints
* add batch_size argument
* don't predict subwords as alternative for words
* fix subword predictions
* limit sequence length
* actually limit sequence length
* improve performance by calculating nearest glove vector on gpu
* add model and tokenizer parameter
* fix type hints
* improve data augmentation performance
* explained limits of script
* corrected comment
* added data augmentation test
* don't label every question in augmented dataset as impossible
* add sample glove
* better handling of downloading of glove
* fix typo of last commit
* Add duplicate_documents to base class initialization
* Remove redundant assignment in subclasses
Co-authored-by: Yorick van Zweeden <git@yorickvanzweeden.nl>
* fix#1687
* fix RuntimeError: received 0 items of ancdata
* Add an arg multiprocessing_strategy to DataSilo and DPR.train()
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix#1687
* fix - UserWarning: Creating a tensor from a list of numpy.ndarrays is extremely slow..
* fix RuntimeError: received 0 items of ancdata
* Remove set_sharing_strategy from this branch and replace numpy.zeros_like with python numpy
* Add ParsrConverter
* Fix typing error + add Parsr to Linux CI
* Fix valid_language for all converters + fix context generation for ParsrConverter
* Remove ParsrConverter test from WindowsCI
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* set fixture scope to "function"
* run FARMReader without multiprocessing
* dispose off ray after tests
* run most expensive tasks first in test files
* run expensive tests first
* run garbage collector between tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* upgrade to pytorch 1.10 and transformers 4.11.3
* pin torch to 1.9.1
* Upgrade transformers and torch to 4.12.2 and 1.10.0
* Test transformers 4.10.2
* Pin transformers to 4.10.2
* transformers 4.10.3
* transformers 4.11.0
* transformers 4.11.1
* transformers 4.11.2
* check fix on current transformer's master branch
* Install transformers from commit id
* update transformers to 4.12.5
* Upgrade torch version for torch-scatter
* Upgrade torch version for torch-scatter in Windows CI
* Build new cache
* Undo last commit
* Use transformers v4.11.2
* bump transformers to 4.12.5
* bump transformers to 4.13.0
* re-allow range of torch versions
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Rely api healthcheck on status code rather than json decoding
* Install UI dependencies on the Linux and Windows CI
Co-authored-by: Fabrice Depaulis <fabrice.depaulis@orange.com>
Co-authored-by: ZanSara <sarazanzo94@gmail.com>