* first fucntioning DocxFileToDocument
* fix lazy import message
* add reno
* Add license headder
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* change DocxFileToDocument to DocxToDocument
* Update library install to the maintained version
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* clan try-exvept to only take non haystack errors into account
* Add wanring on docstring of component ignoring page brakes, mark test as skip
* make warnings lazy evaluations
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* make warnings lazy evaluations
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Make warnings lazy evaluated
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Solve f bug
* Get more metadata from docx files
* add 'python-docx' dependency and docs
* Change logging import
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Fix typo
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* remake metadata extraction for docx
* solve bug regarding _get_docx_metadata method
* Update haystack/components/converters/docx.py
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Update haystack/components/converters/docx.py
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Delete unused test
---------
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation.
* Implement `missing_meta` functionality in `run()`.
* Finish first draft of revised `MetaFieldRanker` functionality.
* Add tests for `MetaFieldRanker` `missing_meta` functionality.
* Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation.
* Implement `missing_meta` functionality in `run()`.
* Finish first draft of revised `MetaFieldRanker` functionality.
* Add tests for `MetaFieldRanker` `missing_meta` functionality.
* Add release notes for new `missing_meta` param of `MetaFieldRanker`
* Move part of docs_missing_meta_field warning string outside of `if...elif...else`.
* add remove_component method plus unit tests
* add docstrings
* add reno
* add type annotation to remove_component method
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
* solve bug not allowing a component to be reatached to a pipeline after being removed
* Properly remove Component from Pipeline
* Ignore mypy
---------
Co-authored-by: Madeesh Kannan <shadeMe@users.noreply.github.com>
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
* Add first pass at PPTXToDocument converter
* Add test and update code
* Add doc string
* Update docstrings
* Add release notes
* remove unused imports, add to api docs, update pyproject.toml
* Add a new test
* Add dep so tests can run
* Fix running Pipeline with conditional branch and Component with default inputs
* Add release notes
* Change arg name of _init_to_run so it's clearer
* Enhance release note
* Rework boilerplate function that run Pipeline in scenarios testing
* Update tests to use new dataclasses
* Update README.md to reflect dataclass changes
* Use absolute import from conftest
* add enviroment variables to the _enviroment.py file
* add support for two of the three variables
* Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder.
* Replicate support for env vars in OpenAITextEmbedder.
* Add support for env vars in OpenAIGenerator..
* Add support for env vars in OpenAIChatGenerator.
* add docstrings and reno
* add params to __init__ in OpenAIDocumentEmbedder
* add params to __init__ in OpenAITextEmbedder
* make fully functional implementation of env vars and unit tests
* update reno
* Update haystack/components/embedders/openai_text_embedder.py
* reverse changes to telemetry/_enviroment.py
* Update haystack/components/embedders/openai_text_embedder.py
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* move graph initialization to the base class
* simplify data normalization
* deepcopy data in base class
* initialize inputs state
* move to_run preparation to the base class
* Test Pipeline._init_to_run()
* Test Pipeline._init_inputs_state()
* Test Pipeline._prepare_component_input_data()
---------
Co-authored-by: Silvano Cerza <silvanocerza@gmail.com>
* fix: Update device deserializtion for SentenceTransformersTextEmbedder
* Add unit test
* Fix unit test
* Make same change to doc embedder
* Add release notes
* Add same change to Diversity Ranker and Named Entity Extractor
* Add unit test
* Add the same for whisper local
* Update release notes
* incorporating better bm25 impl without breaking interface
* all three bm25 algos
* 1. setting algo post-init not allowed; 2. remove extra underscore for naming consistency; 3. remove unused import
* 1. rename attribute name for IDF computation 2. organize document statistics as a dataclass instead of tuple to improve readability
* fix score type initialization (int -> float) to pass mypy check
* release note included
* fixing linting issues and mypy
* fixing tests
* removing heapq import and cleaning up logging
* changing indexing order
* adding more tests
* increasing tests
* removing rank_bm25 from pyproject.toml
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>