* Fix TikaConverter not having \f page tag by using HTML mode of parsing and then parsing the HTML to text using the old Haystack 1.X integration as template.
* Add Reno
* Fix test by making Mock Tika return XML (before parsing)
* refinements and test
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
* Fix issue that could lead to RCE if using unsecure Jinja templates
* Add comment explaining exception suppression
* Update release note
* Update release note
* Simplify lg + standardize
* Format
* Update formatting
* Fix formatting again
* Fix empty line
* Change formatting
* Format with black
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Fix bug in DocumentSplitter and expand tests to catch said bug
* Fix split overlap information calc and actually test it
* Add release notes
* Remove comments
* Same fix in SentenceWindowRetrieval
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Pin structlog to 24.2.0 due to unit test failures
* Remove object init parameter in huggingface_hub unit tests
* Use less restrictive structlog pin
* Add release note
* Fix bug in Pipeline.run() executing Components in a wrong and unexpected order
* Update haystack/core/pipeline/base.py
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Move utility functions from _enqueue_next_runnable_component (#7895)
* Isolate logic to check if we're stuck in a loop
* Simplify for else
* Add missing return in docstring
* Emit warning when stuck in a loop
* Fix docstring
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* Add utility function to move Components in queues
* Add function to find next Component to run
* Comment update
* Add missing break in loop
* Make _add_missing_input_defaults less error prone and add tests
* Fix tests
* Update docstring
* Simplify enqueue logic
* Remove unused _enqueue_next_runnable_component function
* Add method to find Component with lazy variadic input or all inputs with defaults
* Simplify _find_next_runnable_lazy_variadic_or_default_component
* Remove unnecessary type ignore
* Split _dequeue_components_that_received_no_input into separate functions
* Fix linting
* Simplify variadic check when running Component
* Simplify code
* Reorganize functions used by Pipeline.run
* Rename variables used in Pipeline.run() for clarity
* Add comment clarifying last_waiting_queue and before_last_waiting_queue
* Add functions to easily update waiting_queue
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* initial support for api_params
* add tests and reno
* resolve suggestions and add integration test
* fix mypy
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* initial import
* adding tests
* adding license and release notes
* adding missing release notes
* working with any type of doc store
* nit
* adding get_class_object to serialization package
* nit
* refactoring get_class_object()
* refactoring get_class_object()
* chaning type and var names
* more refactoring
* Update haystack/core/serialization.py
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Update haystack/core/serialization.py
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Update test/core/test_serialization.py
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* more refactoring
* more refactoring
* Pydoc syntax
---------
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* Fix from_dict to work if device isn't provided in init params
* Minor refactoring of from_dict for components that load HF models
* Add tests
* Update tests to test loading with all default parameters
* Add more tests
* Add release notes
* Add unit test for whisper local
* Update reno
* Add fix for ExtractiveReader
* Fix NamedEntityExtractor
* Fix default value for huggingface_pipeline_kwargs
* Add reno note
* Update HuggingFaceLocalGenerator.from_dict to use the same logic as HuggingFaceLocalChatGenerator.from_dict
* Update tests slightly
* Update release note
max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.
timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.
Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
* feat: Configure max_retries & timeout for AzureOpenAIDocumentEmbedder
max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.
timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.
Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
* Update retries-and-timeout-for-AzureOpenAIDocumentEmbedder-006fd84204942e43.yaml
* Update haystack/components/embedders/azure_document_embedder.py
* Update haystack/components/embedders/azure_document_embedder.py
---------
Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>
* feat: Configure max_retries & timeout for AzureOpenAIChatGenerator
max_retries: if not set is read from the OPENAI_MAX_RETRIES
env variable or set to 5.
timeout: if not set is read from the OPENAI_TIMEOUT
env variable or set to 30.
Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
* Update haystack/components/generators/chat/azure.py
* Update haystack/components/generators/chat/azure.py
* Update max_retries-for-AzureOpenAIChatGenerator-9e49b4c7bec5c72b.yaml
---------
Signed-off-by: Nitanshu Vashistha <nitanshu.vzard@gmail.com>
Co-authored-by: David S. Batista <dsbatista@gmail.com>