* #4983 implemented split by token for tiktoken tokenizer
* #4983 added unit test for tiktoken splitting
* #4983 implemented and added a test for splitting documents with HuggingFace tokenizer
* #4983 added support for passing HF model names (instead of objects) and added an example to the HF token splitting test
* mocked HTTP model loading in unit tests, fixed pylint error
* fix lossy tokenizers splitting, use LazyImport, ignore UnicodeEncodeError for tiktoken
* reno
* rename reno file
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
* fix tests
* reno
* tests
* retain file name
* paths are strings for openai sdk
* streams->sources
* feedback
* always add name to file
* mypy
* test placeholder with extension
* fallback
* paths
* path test
* path must be a string
* fix test
* Refactor HTMLToDocument
* Add release notes
* Add additional tests
* remove progress bar
* Add additional test for metadata
* remove progress bar from release notes
* Update tests
* Use truthiness checks instead of is not None
* Embedding instructions in EmbeddingRetriever
Query and documents embeddings are prefixed with instructions, useful
for retrievers finetuned on specific tasks, such as Q&A.
* Tests
Checking vectors 0th component vs. reference, using different stores.
* Normalizing vectors
* Release notes
* Implement function to convert legacy filters to new style
* Reduce return statements in conversion to fix linting
* Move convert function in different module
* Fix typos in docstrings
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
---------
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Move tests for delete_documents from DocumentStoreBaseTests to separate class
* Move `filterable_docs` fixture from `DocumentStoreBaseTests` to separate mixin class (#6337)
* Move filterable_docs fixture from DocumentStoreBaseTests to separate mixin class
* refactor: Move generic `filter_documents` tests from `DocumentStoreBaseTests` to separate class (#6338)
* Move generic filter_documents tests from DocumentStoreBaseTests to separate class
* refactor: Move `filter_documents` tests with invalid filters from `DocumentStoreBaseTests` to separate class (#6339)
* Move filter_documents tests with invalid filters from DocumentStoreBaseTests to separate class
* Move `filter_documents` tests with equal filters from `DocumentStoreBaseTests` to separate class (#6340)
* Move filter_documents tests with equal filters from DocumentStoreBaseTests to separate class
* Move `filter_documents` tests with not equal filters from `DocumentStoreBaseTests` to separate class (#6341)
* Move filter_documents tests with not equal filters from DocumentStoreBaseTests to separate class
* Move `filter_documents` tests with in filters from `DocumentStoreBaseTests` to separate class (#6342)
* Move filter_documents tests with in filters from DocumentStoreBaseTests to separate class
* Move `filter_documents` tests with not in filters from `DocumentStoreBaseTests` to separate class (#6343)
* Move filter_documents tests with not in filters from DocumentStoreBaseTests to separate class
* Move `filter_documents` tests with greater than filters from `DocumentStoreBaseTests` to separate class (#6344)
* Move filter_documents tests with greater than filters from DocumentStoreBaseTests to separate class
* Move `filter_documents` tests with greater than equal filters from `DocumentStoreBaseTests` to separate class (#6345)
* Move filter_documents tests with greater than equal filters from DocumentStoreBaseTests to separate class
* Move `filter_documents` tests with less than filters from `DocumentStoreBaseTests` to separate class (#6346)
* Move filter_documents tests with less than filters from DocumentStoreBaseTests to separate class
* Move `filter_documents` tests with less than equal filters from `DocumentStoreBaseTests` to separate class (#6347)
* Move filter_documents tests with less than equal filters from DocumentStoreBaseTests to separate class
* Move `filter_documents` tests with simple logical filters from `DocumentStoreBaseTests` to separate class (#6348)
* Move filter_documents tests with simple logical filters from DocumentStoreBaseTests to separate class
* Move filter_documents tests with nested logical filters from DocumentStoreBaseTests to separate class (#6349)
* fix un-flattening of metadata
* test should pass
* add relnote
* change policy: raise an error if both meta and keys are passed
* Update document.py
* support python 3.8
* adjust wording in the error message
* Load additional fields from SQUAD-format file to meta field for labels
* added a test function
* rewritten test using pytest
* added release notes
* improve release note
* clean up test
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>