* Add optional_variables in ConditionalRouter
* Add reno note
* Add more unit test with various complex scenarios
* Add more unit tests
* Add pylint disable=too-many-positional-arguments
* PR feedback from @sjrl
* Use token instead of use_auth_token because of deprecation warning
* Fix test
* pylint
* fix linting
---------
Co-authored-by: Stefano Fiorucci <stefanofiorucci@gmail.com>
* feat: be tolerant to exceptions
if ever an error is raised by the OpenAI API, don't fail the entire processing
* fix: missing import, string separator
* Enhance error handling
* Use batched from more_itertools for compatibility with older Python versions
* Fix batching and add test
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
* feat: added split by line to DocumentSplitter
* fix: pr review comments
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
---------
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Parametrize document joiner tests with empty lists
* Skip loop in _distribution_based_rank_fusion if document list is empty
* Parametrize test_empty_list with join_mode
* Prevent division by zero in _merge and _reciprocal_rank_fusion
* Add release notes
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
* Add log lines for PDF conversion and make skipping more explicit in DocumentSplitter
* Add logging statement for PDFMinerToDocument as well
* Add tests
* Remove unused line
* Remove unused line
* add reno
* Add in PDF file
* Update checks in PDF converters and add tests for document splitter
* Revert
* Remove line
* Fix comment
* Make mypy happy
* Make mypy happy
* initial import
* Update haystack/components/generators/openai.py
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* docs: fixing
* supporting the three use cases: no system prompt, using system prompt defined at init, using system prompt defined at run time
* renaming 'run_time_system_prompt' to 'system_prompt'
* adding tests, converting methods to static
---------
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* reduced usage of numpy and substituted built-in libraries
* added release note
* edited expit function to support both float as well as list (this case was giving error CI)
* revert code , numpy can't be removed here
* more cleaning
* fix relnote
---------
Co-authored-by: anakin87 <stefanofiorucci@gmail.com>
* add config_kwargs
* disable PLR0913 for a specific function
* add a release note
* refer to AutoConfig in config_kwargs docstring
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
Co-authored-by: Julian Risch <julianrisch@gmx.de>
* draft new component and tests
* draft new component and tests
* fix tests, replace usage of get_attr
* improve docstrings, refactor tests
* add test for mixed documents w/wo scores
* add test with multiple lists and update docstring
* validate inputs, add tests, make methods static
* change fallback to binary relevance
* rename validate_init_parameters to validate_inputs
* fix: make `from_dict` of `PyPDFToDocument` more robust
* chore: drop trailing space
* converting method to static and making the comment shorter
* reverting method to static
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
* Add JSONConverter Component
* Handle some corner cases
* Add JSONConverter to pydoc config
* Add a way to extract all non content fields as metadata
* Small fix in docstring
* Fix tests
* docstrings upd
* Update json.py
---------
Co-authored-by: Daria Fokina <daria.fokina@deepset.ai>
* Port NLTKDocumentSplitter from dC to Haystack
* Improve pydocs
* Use haystack logging
* Add NLTKDocumentSplitter to __init__.py
* Use haystack logging, rename test classes
* Fixing _needs_join return
* Linting
* PR feedback
* More static methods
* Increase test coverage
* Compile pattern
---------
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* chaning default model to gpt-4o-mini
* adding release notes
* fixing some missed tests
* fixing some more missed tests
* fixing one last missed test
* fixing linting issues
* making pylint happy about an end2end test
* chaning if test to walruss operator
* fixing azure embedder from ada to text-embedding-ada-002
* Adding splitting function
* Adding test for split by function
* Adding release note for feat adding split by function
* Fixing release note for split_by_function
* Fixing issue with splitting_function non callable
* nit: fixing value error in documentsplitter for split_by
* Add custom serde
---------
Co-authored-by: Giovanni Alzetta <giovannialzetta@gmail.com>
Co-authored-by: Vladimir Blagojevic <dovlex@gmail.com>
* fix: Prevent the usage of `set_input_type(s)` when the `run` method doesn't have kwargs,
raise if `set_input_type(s)` overrides `run` method parameters
* fix: update components and tests
* reno