* Experimental changes to support Milvus 2.x
* Milvus 2.0 need other containers hence adding them
* Add latest docstring and tutorial changes
* Fixing tests
* Correcting use of list collections
* correcting connection close
* Removing connection close logic
* removing flush
* using collection instead of connection
* fixing describe collection
* Fixing insert, query and search based on new signature
* Making mypy happy
* Fixing one test case
* Fixing search and embedding fetch based on newer api
* Implementing delete vector id function
* Wrapping up final changes
* Add latest docstring and tutorial changes
* Correcting requirements.txt
* removing empty line in requirements.txt
* add docstring and exception for delete
* add docstring. condition import on env var. raise exception for deletion
* fix typo
* change delete signature
* ignore typing for import
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Update jobs link to personio
* Add latest docstring and tutorial changes
* Change jobs link to main website
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add delete_labels() except for weaviate doc store
* Add latest docstring and tutorial changes
* Add test for delete_labels()
* Adapt filter for label deletion to different doc stores in test
* Allow delete labels by _id in elasticsearch
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* re-add bugfix after merge
* Add ids as optional parameter
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Rename TransformersAdamW into simply AdamW (probably changed in transformers at some point)
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add node names validation
* Add tests
* Improve test and test that params exists before validating
* Fix the REST API
* Use minilm-uncased-squad2 instead of roberta-base-squad2
* Use roberta model for test_pipeline.yaml
* Turn off TOKENIZERS_PARALLELISM in generator tests (#1605)
* Account for non-targeted parameters
* Restore previous parameters handling in the rest api
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* adding create checkpoint feature for train function in farm reader
* added arguments for create_or_load_checkpoint function
* accessing class method inside Trainer class
* added default value for checkpoint_root_dir and checkpoint_every, checkpoints_to_keep as arguments for reader.train()
* change in default value for checkpoint_root_dir and checkpoint_every
* update docstring and add Path conversion
Co-authored-by: girish.koushik <girish.koushik@diatoz.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Add endpoint to get documents by filter
* Add test for /documents/get_by_filter and extend the delete documents test
* Add rest_api/file-upload to .gitignore
* Make sure the document store is empty for each test
* Improve docstrings of delete_documents_by_filters and get_documents_by_filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Create extractor/entity.py
* Aggregate NER words into entities
* Support indexing
* Add doc strings
* Add utility for printing
* Update signature of run() to match BaseComponent
* Add test
* Modify simplify_ner_for_qa to return the dictionary and add its test
Co-authored-by: brandenchan <brandenchan@icloud.com>
* Clarify PDF conversion, languages and encodings
The parameter name `valid_languages` may be a bit miss-leading from
reading only the tutorials. Users may, incorrectly assume that it
enforces that the conversions only works for those languages, then it's
more of a check.
- Provided clarifications in the tutorials to highlight what
valid_languages does and that changing the encoding may give better
results for their language of choice
- Updated the command for `pdftotext` to the correct one
* Allow encodings for `convert_files_to_dicts`
- Set option of passing encoding to the converters. Trying even for some
Latin1 languages, the converter does not do it in a good way.
Potential issues is that the encoding defaults to None, which is default
for the other converters, but not for the PDFToTextConverter. Could add
a check and change the ending to Latin1 for pdf if set to None.
Was considering adding it to **kwargs, but since it may be a commonly
used feature to be documented, I added it as a keyword argument instead.
Would love to hear your input and feedback on in.
* Set back PDF default encoding
* Update documentation
* First rough implementation
* Add a flag to dump the debug logs to the console as well
* Typing run() and _dispatch_run()
* Allow debug and debug_logs to be passed as arguments of run()
* Avoid overwriting _debug, later we might want to store other objects in it
* Put logs under a separate key of the _debug dictionary and add input and output of the node alongside it
* Introduce global arguments for pipeline.run() that get applied to every node when defined
* Change default values of debug variables to None, otherwise their default would override the params values
* Remove a potential infinite recursion on the overridden __getattr__
* Do not append the output of the last node in the _debug key, it causes infinite recursion
* Add tests
* Move the input/output collection into _dispatch_run to gather only relevant info
* Add partial Pipeline.run() docstring
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Add rest api endpoint to delete documents by filter.
* Remove parametrization of rest api tests
* Make the paths in rest_api/config.py absolute
* Fix path to pipelines.yaml
* Restructuring test_rest_api.py to be able to test only my endpoint (and to make the suite more structured)
* Convert DELETE /documents into POST /documents/delete_by_filters
Co-authored by: sarthakj2109 <54064348+sarthakj2109@users.noreply.github.com>
* Initial draft of TransformersClassifier
* Add transformers classifier implementation
* Add test for SentenceTransformersClassifier
* Add truncation and corresponding test case to Classifier
* Add zero-shot classification and test
* Add document classifier documentation
* Add latest docstring and tutorial changes
* print meta data with print_documents()
* Add latest docstring and tutorial changes
* Remove top_k param from Classifier usage example
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Make InMemoryDocumentStore accept and apply filters in delete_documents()
* Modify test_document_store.py to test the filtered deletion in memory, sql and milvus too
* Make FAISSDocumentStore accept and properly apply filters in delete_documents()
* Add latest docstring and tutorial changes
* Remove accidentally duplicated test
* Remove unnecessary decorators from test/test_document_store.py::test_delete_documents_with_filters
* Add embeddings count test for FAISS and Milvus; Milvus fails it.
* Fixed a bug that made Milvus not deleting embeddings
* Remove batch size parametrization in tests & update all documentstore's docstrings with a filter example
* Add latest docstring and tutorial changes
Co-authored-by: prafgup <prafulgupta6@gmail.com>