* added hybrid search example
Added an example about hybrid search for faq pipeline on covid dataset
* formatted with back formatter
* renamed document
* fixed
* fixed typos
* added test
added test for hybrid search
* fixed withespaces
* removed test for hybrid search
* fixed pylint
* commented logging
* fixed bug in join_docs.py _concatenate_results
* Update join_docs.py
updated comment
* format with black
* added releasenote on PR
* updated release notes
* updated test_join_documents
* updated test
* updated test
* Update test_join_documents.py
* formatted with black
* fixed test
* fixed
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* Remove old cohere models
* Add aliases for the existing models according to Cohere documentation
* Add release note
* put cohere embdding models in a constant
* update doc strings
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* draft
* still a raw draft
* still a raw draft
* improvements
* minimal impl ok
* tests
* reno
* better language
* examples of generation_kwargs
* incorporate feedback
* lg and format updates
* don't save valid str tokens
* fix style
---------
Co-authored-by: Darja Fokina <daria.f93@gmail.com>
* draft TextLanguageClassifier
* implement language detection with langdetect
* add unit test for logging message
* reno
* pylint
* change input from List[str] to str
* remove empty output connections
* add from_dict/to_dict tests
* mark example usage as python code
* add telemetry to pipelines 2.0
* only collect data if telemetry is on
* reno
* add downsampling
* typing
* manual tests
* pylint
* simplify code
* Update haystack/preview/telemetry/__init__.py
* rather index by component type
* black
* mypy
* review feedback & small improvements
* defaultdict
* stray changes
* lint
* invert condition
* always send the first event of the day
* collect specs
* track 2nd and 3rd events too
* send first event and then max 1 event a minute
* rename constant
* invert condition
* linting
* added hybrid search example
Added an example about hybrid search for faq pipeline on covid dataset
* formatted with back formatter
* renamed document
* fixed
* fixed typos
* added test
added test for hybrid search
* fixed withespaces
* removed test for hybrid search
* fixed pylint
* commented logging
* updated hybrid search example
* release notes
* Update hybrid_search_faq_pipeline.py-815df846dca7e872.yaml
* Update hybrid_search_faq_pipeline.py
* mention hybrid search example in release notes
* reduce installed dependencies in examples test workflow
* do not install cuda dependencies
* skip models if API key not set; delete document indices
* skip models if API key not set; delete document indices
* skip models if API key not set; delete document indices
* keep roberta-base model and inference extra
* pylint
* disable pylint no-logging-basicconfig rule
---------
Co-authored-by: Julian Risch <julian.risch@deepset.ai>
* chore: added on_agent_final_answer-support to Agent callback_manager
* chore: format black
* run pre-commit to format file
* updated release notes
* reverted sorted imports
---------
Co-authored-by: Stefano Fiorucci <44616784+anakin87@users.noreply.github.com>
* draft split by word, sentence, passage
* naive way to split sentences without nltk
* reno
* add tests
* make input list of docs, review feedback
* add source_id and more validation
* update docstrings
* add split delimiters back to strings
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Add Pipeline.arun()
* Sleeper node
* Fix async running
* Add e2e tests
To run a Pipeline that doesn't have any async node in async mode:
pytest e2e/pipelines/test_standard_pipelines.py::test_query_and_indexing_pipeline
To run a Pipeline that has a single async node in concurrent mode:
pytest e2e/pipelines/test_standard_pipelines.py::test_async_concurrent_complex_pipeline
To run a Pipeline that has a single async node in sequential mode:
pytest e2e/pipelines/test_standard_pipelines.py::test_async_sequential_complex_pipeline
* Remove unused _adispatch_run method
* Make Pipeline.run work with async nodes
* Revert "Make Pipeline.run work with async nodes"
This reverts commit 22d7a94e4d41aca1b59dad18c0b366fbb6e8f431.
* Rename Pipeline.arun to Pipeline._arun
* Enhance docstring
* Add Sleeper docstring
* Add release notes
* ignore typing across the node
* make pylint happy
* skip pylint on needed unused import
* fix
* if a node has an arun method, use it
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* copy the deps list over from haystack-ai
* fix lazyimport usage
* keep jinja and openai
* fix ci
* reno
* separate out preview unit tests
* fix import error message for tika
* tika
* add preview to all
* wrap torch
* remove comment
* unwrap openai and jinja
* Add TikaFileToDocument component
* Add tests
* Add tika service to CI
* Add release note
* Change name
* PR feedback
* Fix naming in tests
* Fix tika version in CI
* Update tests
---------
Co-authored-by: ZanSara <sara.zanzottera@deepset.ai>
* bug: fix the date_fields request bottleneck
I encountered a performance issue while attempting to index 1 million vectors. Despite the Weaviate instance having low utilization, the process was estimated to take around 10 hours.
After some investigation, I identified the bottleneck: _get_date_properties function was being called for every document, consequently a request to the Weaviate client was being sent and awaited for each document.
To address this, I optimized the code by invoking the _get_date_properties function only when there is a schema change. This modification resulted in a notable performance improvement, reducing the indexing time to approximately 90 minutes for the same 1 million vectors.
* bug: fix the date_fields request bottleneck
* fix: executed the pre commit hooks for #9341
* Refactor codebase so that doc_type metadata is used instead of namespaces for making distinction between documents without embeddings, documents with embeddings and labels
* Fix parameter name in integration test
* Remove code under comment in add_type_metadata_filter method
* Fix mypy and pylint checks
* Add release note
* Apply minimal changes: rename method, update method docs and remove redundant method
* Mypy fixes
* Fix docstrings
* Revert helper methods for fetching documents when the number of documents exceeds Pinecone limit
* Remove unnecessary attributes in PineconeDocumentStore
* Fix unit test
---------
Co-authored-by: Ivana Zeljkovic <ivana.zeljkovic@smartcat.io>
Co-authored-by: DosticJelena <jelena.dostic@smartcat.io>