* Add pre-commit config
* update contributing guidelines
* try failing the workflow
* add pre-commit to the deps
* updating uninstall instructions
* separate jobs in CI
* make tutorials check fail
* make black check fail
* make openapi check fail
* make yaml schema and api docs checks fail
* highlight the instructions
* Update .pre-commit-config.yaml
Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
* Update CONTRIBUTING.md
Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
* Update CONTRIBUTING.md
Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
* Use black --check
* Add images of the CI
* title level
* feedback
Co-authored-by: Tobias Wochinger <mail@tobias-wochinger.de>
* restart tutorials in the loop
* remove container steps in tutorials.yml
* forgotten quotes
* unmatched bracket
* give names to containers
* try to limit the log size
* make the containers restart on the scripts as well
* feedback
* Raise integration tests timeout
* raising limit again
* clean up tests and run earlier
* use change detection
* better naming, skip ES
* more cleanup
* fix job name
* dummy commit to trigger the CI
* mock away the PDF converter
* make the test compatible with 3.7
* removed leftover
* always run the api tests, use a matrix for the OS
* refactor all the tests
* remove outdated dependency
* pylint
* new abstract method
* adjust for older python versions
* rename pipeline file
* address PR comments
* Remove caching and install audio deps
* Fix `Tutorials` as well
* Run all tutorials even though some fail
* Forgot fi
* fix failure condition
* proper bash string equality
* Enable debug logs
* remove audio files
* Update Documentation & Code Style
* Use the setup action in the Tutorial CI as well
* Try with a file that exists
* Update Documentation & Code Style
* Fix the comments in the tutorials
* Update Documentation & Code Style
* Fix tutorials.sh
* Remove debug logging
* import pprint and try editable install
* Update Documentation & Code Style
* extract no run list
* Add tutorial18 to no run list nightly
* import pprint correctly
* Update Documentation & Code Style
* try making site-packages editable
* Make pythonpath editable every time Tut17 is run on CI
* typo
* fix imports in tut5
* add git clean
* Update Documentation & Code Style
* add comments and remove` -e`
* accidentally deleted a line
* Update .github/utils/tutorials.sh
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* Add new audio answer primitives
* Add AnswerToSpeech
* Add dependency group
* Update Documentation & Code Style
* Extract TextToSpeech in a helper class, create DocumentToSpeech and primitives
* Add tests
* Update Documentation & Code Style
* Add ability to compress audio and more tests
* Add audio group to test, all and all-gpu
* fix pylint
* Update Documentation & Code Style
* Accidental git tag
* Try pleasing mypy
* Update Documentation & Code Style
* fix pylint
* Add warning for missing OS library and support in CI
* Try fixing mypy
* Update Documentation & Code Style
* Add docs, simplify args for audio nodes and add tutorials
* Fix mypy
* Fix run_batch
* Feedback on tutorials
* fix mypy and pylint
* Fix mypy again
* Fix mypy yet again
* Fix the ci
* Fix dicts merge and install ffmpeg on CI
* Make the audio nodes import safe
* Trying to increase tolerance in audio test
* Fix import paths
* fix linter
* Update Documentation & Code Style
* Add audio libs in unit tests
* Update _text_to_speech.py
* Update answer_to_speech.py
* Use dedicated dataset & update telemetry
* Remove and use distilled roberta
* Revert special primitives so that the nodes run in indexing
* Improve tutorials and fix smaller bugs
* Update Documentation & Code Style
* Fix serialization issue
* Update Documentation & Code Style
* Improve tutorial
* Update Documentation & Code Style
* Update _text_to_speech.py
* Minor lg updates
* Minor lg updates to tutorial
* Making indexing work in tutorials
* Update Documentation & Code Style
* Improve docstrings
* Try to use GPU when available
* Update Documentation & Code Style
* Fixi mypy and pylint
* Try to pass the device correctly
* Update Documentation & Code Style
* Use type of device
* use .cpu()
* Improve .ipynb
* update apt index to be able to download libsndfile1
* Fix SpeechDocument.from_dict()
* Change pip URL
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Agnieszka Marzec <97166305+agnieszka-m@users.noreply.github.com>
* Experimental Ci workflow for running tutorials
* Run on every push for now
* Not starting?
* Disabling paths temporarily
* Sort tutorials in natural order
* Install ipython
* remove ipython install
* Try running ipython with sudo
* env.pythonLocation
* Skipping tutorial2 and 9 for speed
* typo
* Use one runner per tutorial, for now
* Typo in dependend job
* Missing quotes broke scripts matrix
* Simplify setup for the tutorials, try to prevent containers conflict
* Remove needless job dependencies
* Try prevent cache issues, fix small Tut10 bug
* Missing deps for running notebook tutorials
* Create three groups of tutorials excluding the longest among them
* remove deps
* use proper bash loop
* Try with a single string
* Fix typo in echo
* Forgot do
* Typo
* Try to make the GraphDB tutorial without launching its own container
* Run notebook and script together
* Whitespace
* separate scrpits and notebooks execution
* Run notebooks first
* Try caching the GoT data before running the scripts
* add note
* fix mkdir
* Fix path
* Update Documentation & Code Style
* missing -r
* Fix folder numbering
* Run notebooks as well
* Typo in notebook command
* complete path in notebook command
* Try with TIKA_LOG_PATH
* Fix folder naming
* Do not use cached data in Tut9
* extracting the number better
* Small tweaks
* Same fix on Tut10 on the notebook
* Exclude GoT cache for tut5 too
* Remove faiss files after tutorial run
* Layout
* fix remove command
* Fix path in tut10 notebook
* Fix typo in node name in tut14
* Third block was too long, rebancing
* Reduce GoT dataset even more, why wasting time after all...
* Fix paths in tut10 again
* do git clean to make sure to cleanup everything (breaks post Python)
* Remove ES file with bad permission at the end of the run
* Split first block, takes >30mins
* take out tut15 for a moment, has an actual bug
* typo
* Forgot rm option
* Simply remove all ES files
* Improve logs of GoT reduction
* Exclude also tut16 from cache to try fix bug
* Replace ll with ls
* Reintroduce 15_TableQA
* Small regrouping
* regrouping to make the min num of runners go for about 30mins
* Add cron schedule and PR paths conditions
* Add some timing information
* Separate tutorials by diff and tutorials by cron
* temp add pull_request to tutorials nightly
* Add badge in README to keep track of the nightly tutorials run
* Remove prefixes from data folder names
* Add fetch depth to get diff with master
* Fix paths again
* typo
* Exclude long-running ones
* Typo
* Fix tutorials.yml as well
* Use head_ref
* Using an action for now
* exclude other files
* Use only the correct command to run the tutorial
* Add long running tutorials in separate runners, just for experiment
* Factor out the complex bash script
* Pass the python path to the bash script
* Fix paths
* adding log statement
* Missing dollarsign
* Resetting variable in loop
* using mini GoT dataset and improving bash script
* change dataset name
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Unify CI tests (from #2466)
* Update Documentation & Code Style
* Change folder names
* Fix markers list
* Remove marker 'slow', replaced with 'integration'
* Soften children check
* Start ES first so it has time to boot while Python is setup
* Run the full workflow
* Try to make pip upgrade on Windows
* Set KG tests as integration
* Update Documentation & Code Style
* typo
* faster pylint
* Make Pylint use the cache
* filter diff files for pylint
* debug pylint statement
* revert pylint changes
* Remove path from asserted log (fails on Windows)
* Skip preprocessor test on Windows
* Tackling Windows specific failures
* Fix pytest command for windows suites
* Remove \ from command
* Move poppler test into integration
* Skip opensearch test on windows
* Add tolerance in reader sas score for Windows
* Another pytorch approx
* Raise time limit for unit tests :(
* Skip poppler test on Windows CI
* Specify to pull with FF only in docs check
* temporarily run the docs check immediately
* Allow merge commit for now
* Try without fetch depth
* Accelerating test
* Accelerating test
* Add repository and ref alongside fetch-depth
* Separate out code&docs check from tests
* Use setup-python cache
* Delete custom action
* Remove the pull step in the docs check, will find a way to run on bot commits
* Add requirements.txt in .github for caching
* Actually install dependencies
* Change deps group for pylint
* Unclear why the requirements.txt is still required :/
* Fix the code check python setup
* Install all deps for pylint
* Make the autoformat check depend on tests and doc updates workflows
* Try installing dependencies in another order
* Try again to install the deps
* quoting the paths
* Ad back the requirements
* Try again to install rest_api and ui
* Change deps group
* Duplicate haystack install line
* See if the cache is the problem
* Disable also in mypy, who knows
* split the install step
* Split install step everywhere
* Revert "Separate out code&docs check from tests"
This reverts commit 1cd59b15ffc5b984e1d642dcbf4c8ccc2bb6c9bd.
* Add back the action
* Proactive support for audio (see text2speech branch)
* Fix label generator tests
* Remove install of libsndfile1 on win temporarily
* exclude audio tests on win
* install ffmpeg for integration tests
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
This PR changes the instance type of the public Haystack demo from p3.2xlarge to g4dn.2xlarge.
g4dn.2xlarge has 1 GPU, 8 vCPUs, 32 GiB of memory
p3.2xlarge had 1 GPU, 8 vCPUs, 61 GiB of memory
which results in 75% lower costs with g4dn.2xlarge.
I also tried out the even smaller g4dn.xlarge, which has 1 GPU, 4 vCPUs, 16 GiB of memory. However, the memory was not enough to run the demo. I tried out multiple requests at the same time and it worked well with g4dn.2xlarge. Requests are slightly slower as with the more powerful instance type but it's hard to notice.
* Update version to 1.4.1rc0
* Add hint of enabling action on the fork in the PR template
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Move super in OpenSearchDocumentStore and add small test
* Update Documentation & Code Style
* Add Opensearch container to the CI
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Change exception into warning, add strict_version param, and remove compatibility between schemas
* Simplify update_json_schema
* Rename unstable into master
* Prevent validate_config from changing the config to validate
* Fix version validation and add tests
* Rename master into ignore
* Complete parameter rename
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Upgrade pdftotext also on pinecone and milvus1 jobs
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* remove duplicate imports
* fix ungrouped-imports
* Fix wrong-import-position
* Fix unused-import
* pyproject.toml
* Working on wrong-import-order
* Solve wrong-import-order
* fix Pool import
* Move open_search_index_to_document_store and elasticsearch_index_to_document_store in elasticsearch.py
* remove Converter from modeling
* Fix mypy issues on adaptive_model.py
* create es_converter.py
* remove converter import
* change import path in tests
* Restructure REST API to not rely on global vars from search.apy and improve tests
* Fix openapi generator
* Move variable initialization
* Change type of FilterRequest.filters
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
* Fix 'bug' on Weaviate only returning max. 100 docs on get_all_documents
* Add type
* Update Weaviate version on the CI
* Fix bug on get_document_count where there are no documents
* Add more info in the docstrings of get_all_documents and get_all_documents_generator
* Add latest docstring and tutorial changes
* Apply Black
* Update Documentation & Code Style
* Trigger pipeline
* Update Documentation & Code Style
* Include StefanBogdan feedback
* Fix mypy issues and LogicalFilterClause
* Add more types
* Update Documentation & Code Style
* update setup.cfg
* Upgrade weaviate containers too
* Allow to filter for content field in Weaviate
* Use convert_to_weaviate instead of convert_to_pinecone
* Fix _get_all_documents_in_index
* Update docstrings and docs
* Catching an exception in get_document(s)_by_id
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* Run Pinecone tests only if files related to Pinecone changed
* Change in pinecone.py that will be reverted
* Revert change in pinecone.py
* Test Pinecone also when filter_utils.py changes
* added core install and functionality of pinecone doc store (init, upsert, query, delete)
* implemented core functionality of Pinecone doc store
* Update Documentation & Code Style
* updated filtering to use Haystack filtering and reduced default batch_size
* Update Documentation & Code Style
* removed debugging code
* updated Pinecone filtering to use filter_utils
* removed uneeded methods and minor tweaks to current methods
* fixed typing issues
* Update Documentation & Code Style
* Allow filters in al methods except get_embedding_count
* Fix skipping document store tests
* Update Documentation & Code Style
* Fix handling of Milvus1 and Milvus2 in tests
* Update Documentation & Code Style
* Fix handling of Milvus1 and Milvus2 in tests
* Update Documentation & Code Style
* Remove SQL from tests requiring embeddings
* Update Documentation & Code Style
* Fix get_embedding_count of Milvus2
* Make sure to start Milvus2 tests with a new collection
* Add pinecone to test suite
* Update Documentation & Code Style
* Fix typing
* Update Documentation & Code Style
* Add pinecone to docstores dependendcy
* Add PineconeDocStore to API Documentation
* Add missing comma
* Update Documentation & Code Style
* Adapt format of doc strings
* Update Documentation & Code Style
* Set API key as environment variable
* Skip Pinecone tests in forks
* Add sleep after deleting index
* Add sleep after deleting index
* Add sleep after creating index
* Add check if index ready
* Remove printing of index stats
* Create new index for each pinecone test
* Use RestAPI instead of Python API for describe_index_stats
* Fix accessing describe_index_stats
* Remove usages of describe_index_stats
* Run pinecone tests separately
* Update Documentation & Code Style
* Add pdftotext to pinecone tests
* Remove sleep from doc store fixture
* Add describe_index_stats
* Remove unused imports
* Use pull_request_target trigger
* Revert use pull_request_target trigger
* Remove set_config
* Add os to conftest
* Integrate review comments
* Set include_values to False
* Remove quotation marks from pinecone.Index type
* Update Documentation & Code Style
* Update Documentation & Code Style
* Fix number of args in error messages
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: bogdankostic <bogdankostic@web.de>
* add basic telemetry features
* change pipeline_config to _component_config
* Update Documentation & Code Style
* add super().__init__() calls to error classes
* make posthog mock work with python 3.7
* Update Documentation & Code Style
* update link to docs web page
* log exceptions, send event for raised HaystackErrors, refactor Path(CONFIG_PATH)
* add comment on send_event in BaseComponent.init() and fix mypy
* mock NonPrivateParameters and fix pylint undefined-variable
* Update Documentation & Code Style
* check model path contains multiple /
* add test for writing to file
* add test for en-/disable telemetry
* Update Documentation & Code Style
* merge file deletion methods and ignore pylint global statement
* Update Documentation & Code Style
* set env variable in demo to activate telemetry
* fix mock of HAYSTACK_TELEMETRY_ENABLED
* fix mypy and linter
* add CI as env variable to execution contexts
* remove threading, add test for custom error event
* Update Documentation & Code Style
* simplify config/log file deletion
* add test for final event being sent
* force writing config file in test
* make test compatible with python 3.7
* switch to posthog production server
* Update Documentation & Code Style
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>