* first fucntioning DocxFileToDocument
* fix lazy import message
* add reno
* Add license headder
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* change DocxFileToDocument to DocxToDocument
* Update library install to the maintained version
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* clan try-exvept to only take non haystack errors into account
* Add wanring on docstring of component ignoring page brakes, mark test as skip
* make warnings lazy evaluations
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* make warnings lazy evaluations
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Make warnings lazy evaluated
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Solve f bug
* Get more metadata from docx files
* add 'python-docx' dependency and docs
* Change logging import
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Fix typo
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* remake metadata extraction for docx
* solve bug regarding _get_docx_metadata method
* Update haystack/components/converters/docx.py
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Update haystack/components/converters/docx.py
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Delete unused test
---------
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation.
* Implement `missing_meta` functionality in `run()`.
* Finish first draft of revised `MetaFieldRanker` functionality.
* Add tests for `MetaFieldRanker` `missing_meta` functionality.
* Add `missing_meta` param to `MetaFieldRanker`, plus checks for validation.
* Implement `missing_meta` functionality in `run()`.
* Finish first draft of revised `MetaFieldRanker` functionality.
* Add tests for `MetaFieldRanker` `missing_meta` functionality.
* Add release notes for new `missing_meta` param of `MetaFieldRanker`
* Move part of docs_missing_meta_field warning string outside of `if...elif...else`.
* Add first pass at PPTXToDocument converter
* Add test and update code
* Add doc string
* Update docstrings
* Add release notes
* remove unused imports, add to api docs, update pyproject.toml
* Add a new test
* Add dep so tests can run
* add enviroment variables to the _enviroment.py file
* add support for two of the three variables
* Add support for 'OPENAI_TIMEOUT' and 'OPENAI_MAX_RETRIES' on OpenAIDocument Ebedder.
* Replicate support for env vars in OpenAITextEmbedder.
* Add support for env vars in OpenAIGenerator..
* Add support for env vars in OpenAIChatGenerator.
* add docstrings and reno
* add params to __init__ in OpenAIDocumentEmbedder
* add params to __init__ in OpenAITextEmbedder
* make fully functional implementation of env vars and unit tests
* update reno
* Update haystack/components/embedders/openai_text_embedder.py
* reverse changes to telemetry/_enviroment.py
* Update haystack/components/embedders/openai_text_embedder.py
---------
Co-authored-by: Massimiliano Pippi <mpippi@gmail.com>
* fix: Update device deserializtion for SentenceTransformersTextEmbedder
* Add unit test
* Fix unit test
* Make same change to doc embedder
* Add release notes
* Add same change to Diversity Ranker and Named Entity Extractor
* Add unit test
* Add the same for whisper local
* Update release notes
* Update huggingface_hub classes used after library upgrade
* Fix chat tests
* Update lazy import guard and other references to huggingface_hub>=0.23.0
* In huggingface_hub 0.23.0 TextGenerationOutput property details is now optional
* More fixes
* Add reno note
* calculate page number of answer and add to meta
* fix mypy, add reno
* add test
* simplify unit test
* update release note
* undo @patch updates
* extend tests, check page_number type
* Initial commit pdfminer converter
* Revert back naming of argument all_text per pdfminer documentation
* Add the component decorator
* Add release notes
* Reformat code with black
* Remove LTPage and comments
* Update dependencies in pyproject.toml
* Added some tests and incorporated reference doc in docstring
* Added some tests and incorporated reference doc in docstring
* Add the implementation for page counting used in the v1.25.x branch. It should work as expected in issue #6705.
* Add tests that reflect the desired behabiour. This behabiour is inffered from the one it had on Haystack 1.x
Solve some minor bugs spotted by tests.
* Update docstrings.
* Add reno.
* Update haystack/components/preprocessors/document_splitter.py
Update docstring from suggestion
Co-authored-by: David S. Batista <dsbatista@gmail.com>
* solve suggestion to improve readability
* fragment tests
* Update haystack/components/preprocessors/document_splitter.py
Co-authored-by: David S. Batista <dsbatista@gmail.com>
* Update .gitignore
* Update .gitignore
* Update add-page-number-to-document-splitter-162e9dc7443575f0.yaml
* blackening
---------
Co-authored-by: David S. Batista <dsbatista@gmail.com>
The new `EvaluationRunResult` has slightly different semantics - it separates the previous `data` parameter into `inputs` and `results`and expects aggregate scores to be provided in the latter.
* fix eval metric docstrings, change type of individual scores
* change import order
* change exactmatch docstring to single ground truth answer
* change exactmatch comment to single ground truth answer
* reverted changing docs to single ground truth
* add warm up in SASEvaluator example
* fix FaithfulnessEvaluator docstring example
* extend FaithfulnessEvaluator docstring example
* Update FaithfulnessEvaluator init docstring
* Remove outdated default from LLMEvaluator docstring
* Add examples param to LLMEvaluator docstring example
* Add import and print to LLMEvaluator docstring example
* Initial commit
* Remove old mock tests
* Fix current_last_page_number calculation
* Carry over unit tests from the other side
* Update pydocs, skip failing tests
* Fix pylint and mypy
* Minor adjustments
* Add release note
* Minor touch ups
* Resolve Document unique id issue by using custom id calculation
* Better hashing, add unit tests
* Small fixes