* test pre commit hook
* test status
* test on this branch
* push generated docstrings and tutorials to branch
* fixed syntax error
* Add latest docstring and tutorial changes
* add files before commit
* catch commit error
* separate generation from deployment
* add deployment process for staging
* add current branch to payload
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* use Path class in method add_eval_data of haystack.document_store.base.py
* change type of jsonl_filename as squad_json_to_jsonl and add_eval_data are expecting string type
* Make batchwise adding of evaluation data possible
* Fix typos in docstrings
* Merge add_eval_data and add_eval_data_batchwise
* Improve import statements
* Move add_eval_data to BaseDocumentStore
* Add batch_size param to write_documents and write_labels in EsDocStore
* Adjust docstring
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* automate docstring and tutorial generation with every push to master
* test CI for current branch
* fixed yaml syntax
* add setupttools to install process
* checkout repo
* fixed command for shell script
* install wheel as it is needed for CI
* install mkdocs
* test without shell script
* use package from github actions
* test other configuration
* back to right config
* cleaning script
* Integration of SummarizationQAPipeline with Haystack.
* Moving summarizer tests because of OOM issue
* Fixing typo
* Splitting summarizer test in separate ci step
* Removing sysctl configuration as we already running elastic search in docker container
* fixing mypy issue
* update parameter names and docstrings
* update parameter names in BaseSummarizer
* rename pipeline
* change return type of summarizer from answer to document
* change scope of doc store fixture
* revert scope
* temp. disable test_faiss_index_save_and_load()
* fix mypy. change order for mypy in CI
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* Using Columns name instead of ORM object for get all documents call
* Separating meta search from documents. This way it will optimize the memory not duplicating document.text
* Fixing mypy issue
* SQLite have limit on number of host variable hence using batching to fetch meta information
* Query meta only if meta field is not Null in DocOrm
* Add batch_size to other functions except label
* meta can be none so fix that issue
* Dummy commit to trigger CI
* Using chunked dictionary
* Upgrading faiss
* reverting change related to faiss upgrade
* Changing DB name in test_faiss_retrieving test as it might interfere with exiting files by corrupting DB file
* Updating doc string related to batch_size
* Update docstring for batch_size
Co-authored-by: Malte Pietsch <malte.pietsch@deepset.ai>
* WIP: First version of preprocessing tutorial
* stride renamed overlap, ipynb and py files created
* rename split_stride in test
* Update preprocessor api documentation
* define order for markdown files
* define order of modules in api docs
* Add colab links
* Incorporate review feedback
Co-authored-by: PiffPaffM <markuspaff.mp@gmail.com>
* new docs version
* updated directory structure
* Add pipelines page
* Add Finder deprecation suggestion
* header for pipelines file
* Document MySQL support
* Mention DPR train tutorial coming soon
* Mention open distro ES
* Update doc strings regarding similarity fn
* Add link to API docs
* Wrap pipelines docs in box
* add api reference for pipelines
* copied latest version to v0.6.0
* Remove space
* Remove space
* Copy to v0.6.0
Co-authored-by: brandenchan <brandenchan@icloud.com>
* Update preprocessor.py
Concatenation of sentences done correctly. Stride functionality enabled for splitting by words while respecting sentence boundaries.
* Simplify code, add test
Co-authored-by: Krak91 <45461739+Krak91@users.noreply.github.com>