* Remove api_key from serialization of AzureOCRDocumentConverter
* Remove api_key from serialization of SerperDevWebSearch
* Add release notes
* Add init_fail_without_api_key test for SerperDevWebSearch
* Rename env var to AZURE_AI_API_KEY
* move embedding backends
* use token in Sentence Transformers embeddings
* more compact token handling
* token parameter in reader
* add token to ranker
* release note
* add test for reader
* added hybrid search example
Added an example about hybrid search for faq pipeline on covid dataset
* formatted with back formatter
* renamed document
* fixed
* fixed typos
* added test
added test for hybrid search
* fixed withespaces
* removed test for hybrid search
* fixed pylint
* commented logging
* fixed bug in join_docs.py _concatenate_results
* Update join_docs.py
updated comment
* format with black
* added releasenote on PR
* updated release notes
* updated test_join_documents
* updated test
* updated test
* Update test_join_documents.py
* formatted with black
* fixed test
* fixed
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Remove old cohere models
* Add aliases for the existing models according to Cohere documentation
* Add release note
* put cohere embdding models in a constant
* update doc strings
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* draft
* still a raw draft
* still a raw draft
* improvements
* minimal impl ok
* tests
* reno
* better language
* examples of generation_kwargs
* incorporate feedback
* lg and format updates
* don't save valid str tokens
* fix style
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
* draft TextLanguageClassifier
* implement language detection with langdetect
* add unit test for logging message
* reno
* pylint
* change input from List[str] to str
* remove empty output connections
* add from_dict/to_dict tests
* mark example usage as python code
* add telemetry to pipelines 2.0
* only collect data if telemetry is on
* reno
* add downsampling
* typing
* manual tests
* pylint
* simplify code
* Update haystack/preview/telemetry/__init__.py
* rather index by component type
* black
* mypy
* review feedback & small improvements
* defaultdict
* stray changes
* lint
* invert condition
* always send the first event of the day
* collect specs
* track 2nd and 3rd events too
* send first event and then max 1 event a minute
* rename constant
* invert condition
* linting
* added hybrid search example
Added an example about hybrid search for faq pipeline on covid dataset
* formatted with back formatter
* renamed document
* fixed
* fixed typos
* added test
added test for hybrid search
* fixed withespaces
* removed test for hybrid search
* fixed pylint
* commented logging
* updated hybrid search example
* release notes
* Update hybrid_search_faq_pipeline.py-815df846dca7e872.yaml
* Update hybrid_search_faq_pipeline.py
* mention hybrid search example in release notes
* reduce installed dependencies in examples test workflow
* do not install cuda dependencies
* skip models if API key not set; delete document indices
* skip models if API key not set; delete document indices
* skip models if API key not set; delete document indices
* keep roberta-base model and inference extra
* pylint
* disable pylint no-logging-basicconfig rule
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* chore: added on_agent_final_answer-support to Agent callback_manager
* chore: format black
* run pre-commit to format file
* updated release notes
* reverted sorted imports
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* draft split by word, sentence, passage
* naive way to split sentences without nltk
* reno
* add tests
* make input list of docs, review feedback
* add source_id and more validation
* update docstrings
* add split delimiters back to strings
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Add Pipeline.arun()
* Sleeper node
* Fix async running
* Add e2e tests
To run a Pipeline that doesn't have any async node in async mode:
pytest e2e/pipelines/test_standard_pipelines.py::test_query_and_indexing_pipeline
To run a Pipeline that has a single async node in concurrent mode:
pytest e2e/pipelines/test_standard_pipelines.py::test_async_concurrent_complex_pipeline
To run a Pipeline that has a single async node in sequential mode:
pytest e2e/pipelines/test_standard_pipelines.py::test_async_sequential_complex_pipeline
* Remove unused _adispatch_run method
* Make Pipeline.run work with async nodes
* Revert "Make Pipeline.run work with async nodes"
This reverts commit 22d7a94e4d41aca1b59dad18c0b366fbb6e8f431.
* Rename Pipeline.arun to Pipeline._arun
* Enhance docstring
* Add Sleeper docstring
* Add release notes
* ignore typing across the node
* make pylint happy
* skip pylint on needed unused import
* fix
* if a node has an arun method, use it
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* copy the deps list over from haystack-ai
* fix lazyimport usage
* keep jinja and openai
* fix ci
* reno
* separate out preview unit tests
* fix import error message for tika
* tika
* add preview to all
* wrap torch
* remove comment
* unwrap openai and jinja
* Add TikaFileToDocument component
* Add tests
* Add tika service to CI
* Add release note
* Change name
* PR feedback
* Fix naming in tests
* Fix tika version in CI
* Update tests
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>