* Update README.md
* Incorporate link into Haystack logo
* Fix jobs link
* Update tutorials and demo
* Change order of sections
* Rename tutorial section
* Create jobs and community sections
* Change wording
* Change section title
* Change wording
* Add tutorial links and pipeline image
* Truncate too large tables for TableReader
* Add documentation
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Files moved, imports all broken
* Fix most imports and docstrings into
* Fix the paths to the modules in the API docs
* Add latest docstring and tutorial changes
* Add a few pipelines that were lost in the inports
* Fix a bunch of mypy warnings
* Add latest docstring and tutorial changes
* Create a file_classifier module
* Add docs for file_classifier
* Fixed most circular imports, now the REST API can start
* Add latest docstring and tutorial changes
* Tackling more mypy issues
* Reintroduce from FARM and fix last mypy issues hopefully
* Re-enable old-style imports
* Fix some more import from the top-level package in an attempt to sort out circular imports
* Fix some imports in tests to new-style to prevent failed class equalities from breaking tests
* Change document_store into document_stores
* Update imports in tutorials
* Add latest docstring and tutorial changes
* Probably fixes summarizer tests
* Improve the old-style import allowing module imports (should work)
* Try to fix the docs
* Remove dedicated KnowledgeGraph page from autodocs
* Remove dedicated GraphRetriever page from autodocs
* Fix generate_docstrings.sh with an updated list of yaml files to look for
* Fix some more modules in the docs
* Fix the document stores docs too
* Fix a small issue on Tutorial14
* Add latest docstring and tutorial changes
* Add deprecation warning to old-style imports
* Remove stray folder and import Dict into dense.py
* Change import path for MLFlowLogger
* Add old loggers path to the import path aliases
* Fix debug output of convert_ipynb.py
* Fix circular import on BaseRetriever
* Missed one merge block
* re-run tutorial 5
* Fix imports in tutorial 5
* Re-enable squad_to_dpr CLI from the root package and move get_batches_from_generator into document_stores.base
* Add latest docstring and tutorial changes
* Fix typo in utils __init__
* Fix a few more imports
* Fix benchmarks too
* New-style imports in test_knowledge_graph
* Rollback setup.py
* Rollback squad_to_dpr too
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Experimental changes to support Milvus 2.x
* Milvus 2.0 need other containers hence adding them
* Add latest docstring and tutorial changes
* Fixing tests
* Correcting use of list collections
* correcting connection close
* Removing connection close logic
* removing flush
* using collection instead of connection
* fixing describe collection
* Fixing insert, query and search based on new signature
* Making mypy happy
* Fixing one test case
* Fixing search and embedding fetch based on newer api
* Implementing delete vector id function
* Wrapping up final changes
* Add latest docstring and tutorial changes
* Correcting requirements.txt
* removing empty line in requirements.txt
* add docstring and exception for delete
* add docstring. condition import on env var. raise exception for deletion
* fix typo
* change delete signature
* ignore typing for import
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Update jobs link to personio
* Add latest docstring and tutorial changes
* Change jobs link to main website
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add delete_labels() except for weaviate doc store
* Add latest docstring and tutorial changes
* Add test for delete_labels()
* Adapt filter for label deletion to different doc stores in test
* Allow delete labels by _id in elasticsearch
* Add latest docstring and tutorial changes
* Add latest docstring and tutorial changes
* re-add bugfix after merge
* Add ids as optional parameter
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Rename TransformersAdamW into simply AdamW (probably changed in transformers at some point)
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Add node names validation
* Add tests
* Improve test and test that params exists before validating
* Fix the REST API
* Use minilm-uncased-squad2 instead of roberta-base-squad2
* Use roberta model for test_pipeline.yaml
* Turn off TOKENIZERS_PARALLELISM in generator tests (#1605)
* Account for non-targeted parameters
* Restore previous parameters handling in the rest api
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* adding create checkpoint feature for train function in farm reader
* added arguments for create_or_load_checkpoint function
* accessing class method inside Trainer class
* added default value for checkpoint_root_dir and checkpoint_every, checkpoints_to_keep as arguments for reader.train()
* change in default value for checkpoint_root_dir and checkpoint_every
* update docstring and add Path conversion
Co-authored-by: girish.koushik <girish.koushik@diatoz.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Add endpoint to get documents by filter
* Add test for /documents/get_by_filter and extend the delete documents test
* Add rest_api/file-upload to .gitignore
* Make sure the document store is empty for each test
* Improve docstrings of delete_documents_by_filters and get_documents_by_filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Create extractor/entity.py
* Aggregate NER words into entities
* Support indexing
* Add doc strings
* Add utility for printing
* Update signature of run() to match BaseComponent
* Add test
* Modify simplify_ner_for_qa to return the dictionary and add its test
Co-authored-by: brandenchan <brandenchan@icloud.com>
* Clarify PDF conversion, languages and encodings
The parameter name `valid_languages` may be a bit miss-leading from
reading only the tutorials. Users may, incorrectly assume that it
enforces that the conversions only works for those languages, then it's
more of a check.
- Provided clarifications in the tutorials to highlight what
valid_languages does and that changing the encoding may give better
results for their language of choice
- Updated the command for `pdftotext` to the correct one
* Allow encodings for `convert_files_to_dicts`
- Set option of passing encoding to the converters. Trying even for some
Latin1 languages, the converter does not do it in a good way.
Potential issues is that the encoding defaults to None, which is default
for the other converters, but not for the PDFToTextConverter. Could add
a check and change the ending to Latin1 for pdf if set to None.
Was considering adding it to **kwargs, but since it may be a commonly
used feature to be documented, I added it as a keyword argument instead.
Would love to hear your input and feedback on in.
* Set back PDF default encoding
* Update documentation
* First rough implementation
* Add a flag to dump the debug logs to the console as well
* Typing run() and _dispatch_run()
* Allow debug and debug_logs to be passed as arguments of run()
* Avoid overwriting _debug, later we might want to store other objects in it
* Put logs under a separate key of the _debug dictionary and add input and output of the node alongside it
* Introduce global arguments for pipeline.run() that get applied to every node when defined
* Change default values of debug variables to None, otherwise their default would override the params values
* Remove a potential infinite recursion on the overridden __getattr__
* Do not append the output of the last node in the _debug key, it causes infinite recursion
* Add tests
* Move the input/output collection into _dispatch_run to gather only relevant info
* Add partial Pipeline.run() docstring
* Add latest docstring and tutorial changes
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>