* Speed up query_by_embedding in InMemoryDocumentStore.
* Make sure query and document embeddings are of the same dtype since they can vary.
* Handle cases where there are 0 and 1 documents.
* Don't put entire embedding matrix on GPU at once. Use separate get_score
functions for the CPU and GPU.
* Norm the vectors in get_scores_numpy in a safer way.
* Apply Black
* Incorporate missing factor of 4 in memory use calculation.
* Apply Black
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Upgrade pydoc-markdown and fix the YAMLs to work with it
* Pin pydoc-markdown to major version
* Generalize pydoc-markdown workflow
* Make a single Action to perform all tasks that require committing into the local branch
* Merge the code updates and the docs in the Linux CI to prevent the bot from always show the pipeline as green
* Installing Jupyter deps for Black
* Build cache before running generation tasks
* Add check not to run the code generation on master
* Simplify push action
* Add more test deps in setup.cfg and remove from GH Action workflow
* Remove forced upgrades on pip install
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Conversion to df does not need initialization
* Apply Black
* fix test case
* Apply Black
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add filters attribute to labels and use in eval
* Add latest docstring and tutorial changes
* overwrite params if None
* populate filters from Label to MultiLabel
* add query_id in eval df and deepcopy params for each label
* fix mypy
* add test for aggregating filters in multilabel
* use query ids also in answers df
* loop through unique query_ids
* hash filters and query text as id
* Add latest docstring and tutorial changes
* fix top_k reader eval
* Apply Black
* rename query_id to id/multilabel_id
* Apply Black
* json dump filters in dataframe
* add filters and id to wrong_examples()
* Apply Black
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* Testing black on ui/
* Applying black on docstores
* Add latest docstring and tutorial changes
* Create a single GH action for Black and docs to reduce commit noise to the minimum, slightly refactor the OpenAPI action too
* Remove comments
* Relax constraints on pydoc-markdown
* Split temporary black from the docs. Pydoc-markdown was obsolete and needs a separate PR to upgrade
* Fix a couple of bugs
* Add a type: ignore that was missing somehow
* Give path to black
* Apply Black
* Apply Black
* Relocate a couple of type: ignore
* Update documentation
* Make Linux CI run after applying Black
* Triggering Black
* Apply Black
* Remove dependency, does not work well
* Remove manually double trailing commas
* Update documentation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make FileTypeClassifier more flexible
* Make supported_types a init parameter
* Add tests and fix a couple of bugs
* Formatting
* Fix mypy
* Implement feedback
* Adding simple setup.py to ui/ and rest_api and remove respective extras from main setup.cfg
* Make 'pip install rest_api/' fetch the local Haystack instead of downloading from pypi
* Add some comments to the new setup.py files and fix the Dockerfiles
* Add version info to 'farm-haystack-ui'
* Fix the OpenAPI Specs workflow
* Install rest_api and ui properly on the CI too
* Make the workflow see changes on every setup file
* Fix workflow cache keys
* Add license to rest_api and ui
* Revert "Make the docstring bot work only on master (#2078)"
This reverts commit 649d07405770cd59696d0120107a3b2f0aafe7c2.
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* 🎨 Update type annotations to allow their extraction for JSON Schema
* ✨ Add main script doing all the work to generate the JSON Schema
* ➕ Add GitHub Action dependency to generate JSON Schema
* ✨ Update JSON Schema generation script to allow easily generating the schema without making a PR
* 👷 Add GitHub Action to generate JSON Schema
* 💚 Fix CI GitHub Action
* 💚 Update GitHub Action environment variables
* ✨ Add initial JSON Schema
* Add latest docstring and tutorial changes
* 🐛 Do not allow extra params not defined in each model
* ♻️ Make any additional properties invalid
* ✨ Make other additional properties invalid in all the levels in pipelines
* ♻️ Do not include Base classes as possible nodes
* 🍱 Update JSON Schema
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* provide option to recreate es doc store on initialization
* Add latest docstring and tutorial changes
* Label expects more arguments
* Label expects also an answer
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Sara Zan <sara.zanzottera@deepset.ai>
* distribute tinybert loss calculation
* improve doc string
* undo unnecessary change
* fix for only one gpu
* adding type hints
* making sure model distillation still works without gpu
* fix bug
* fixing type hints
* Review changes
* Added the synonym analyser for search fields
* Added the review requests.
* Added the synonyms the OpenSearchDocumentStore and review requests.
* Disable cache on the CI
* Reintroduce paths
* Add most files to the cache key
* remove date and path from cache key
* Try double install with cache
* Try to cache more stuff, on a per-commit basis
* Fix windows CI too
* Add comment on how to speed up the CI with better caching
* Add docstrings to the REST API endpoint to have them included in the OpenAPI specs
* Attempt at make GitHub CI generate the OpenAPI specs
* Missing __init__.py was breaking rest_api import
* Add comment on dummy pipeline
* Create separate workflow file for the OpenAPI specs generation
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Markus Paff <markuspaff.mp@gmail.com>
* Remove stray requirements.txt files and update README.md
* Remove requirement files
* Add details about pip bug and link to setup.cfg
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fist attempt at using setup.cfg for dependency management
* Trying the new package on the CI and in Docker too
* Add composite extras_require
* Add the safe_import function for document store imports and add some try-catch statements on rest_api and ui imports
* Fix bug on class import and rephrase error message
* Introduce typing for optional modules and add type: ignore in sparse.py
* Include importlib_metadata backport for py3.7
* Add colab group to extra_requires
* Fix pillow version
* Fix grpcio
* Separate out the crawler as another extra
* Make paths relative in rest_api and ui
* Update the test matrix in the CI
* Add try catch statements around the optional imports too to account for direct imports
* Never mix direct deps with self-references and add ES deps to the base install
* Refactor several paths in tests to make them insensitive to the execution path
* Include tstadel review and re-introduce Milvus1 in the tests suite, to fix
* Wrap pdf conversion utils into safe_import
* Update some tutorials and rever Milvus1 as default for now, see #2067
* Fix mypy config
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* fix data augmentation path in finetuning notebook
* Add latest docstring and tutorial changes
* make distillation possible with other models than BERT
* use smaller dataset for distillation in finetuning tutorial
* Add latest docstring and tutorial changes
* make data augmentation in finetuning faster
* update language models forward doc strings
* fix return type of language models
* remove debug output
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* doc store should return all documents matching ids passed to get_documents_by_id
* test for get_document_by_id should be named correctly
* add test for get_documents_by_id
* Add latest docstring and tutorial changes
* document es query limit
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* minimal DCDocumentStore
* support filters
* implement get_documents_by_id
* handle not existing documents
* add docstrings
* auth added
* add tests
* generate docs
* Add latest docstring and tutorial changes
* add responses to dev dependencies
* fix tests
* support query() and quey_by_embedding()
* Add latest docstring and tutorial changes
* query tests added
* read api_key and api_endpoint from env
* Add latest docstring and tutorial changes
* support query() and quey_by_embedding()
* query tests added
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* support dynamic similarity and return_embedding values
* Add latest docstring and tutorial changes
* adjust KeywordDocumentStore description
* refactoring
* Add latest docstring and tutorial changes
* implement get_document_count and raise on all not implemented methods
* Add latest docstring and tutorial changes
* don't use abbreviation DC in comments and errors
* Add latest docstring and tutorial changes
* docstring added to KeywordDocumentStore
* Add latest docstring and tutorial changes
* enhanced api key set
* split tests into two parts
* change setup.py in order to work around build cache
* added link
* Add latest docstring and tutorial changes
* rename DCDocumentStore to DeepsetCloudDocumentStore
* Add latest docstring and tutorial changes
* remove dc.py
* reinsert link to docs
* fix imports
* Add latest docstring and tutorial changes
* better test structure
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: ArzelaAscoIi <kristof.herrmann@rwth-aachen.de>
* add ndcg and eval_mode to docstrings and reorder dataframe columns in docs
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* add parameters to allow for different hyperparameters in stage 1 and 2 of tinybert distillation
* Add latest docstring and tutorial changes
* improve default parameters
* Add latest docstring and tutorial changes
* split up distillation method
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>