* Some changes
Use tests file path
* Update tests
* Add another unit test
* Shorten _get_docx_metadata
* Update tests
* Remove try block
* Add a dataclass
* Add a to dict unit test
* Remove unused import
* Add release notes
* Update docstrings
* Use optional instead of pipe
* Update docstring
* Remove file
* first fucntioning DocxFileToDocument
* fix lazy import message
* add reno
* Add license headder
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* change DocxFileToDocument to DocxToDocument
* Update library install to the maintained version
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* clan try-exvept to only take non haystack errors into account
* Add wanring on docstring of component ignoring page brakes, mark test as skip
* make warnings lazy evaluations
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* make warnings lazy evaluations
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Make warnings lazy evaluated
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Solve f bug
* Get more metadata from docx files
* add 'python-docx' dependency and docs
* Change logging import
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Fix typo
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* remake metadata extraction for docx
* solve bug regarding _get_docx_metadata method
* Update haystack/components/converters/docx.py
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Update haystack/components/converters/docx.py
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Delete unused test
---------
Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
* Add first pass at PPTXToDocument converter
* Add test and update code
* Add doc string
* Update docstrings
* Add release notes
* remove unused imports, add to api docs, update pyproject.toml
* Add a new test
* Add dep so tests can run
* Initial commit pdfminer converter
* Revert back naming of argument all_text per pdfminer documentation
* Add the component decorator
* Add release notes
* Reformat code with black
* Remove LTPage and comments
* Update dependencies in pyproject.toml
* Added some tests and incorporated reference doc in docstring
* Added some tests and incorporated reference doc in docstring
* Initial commit
* Remove old mock tests
* Fix current_last_page_number calculation
* Carry over unit tests from the other side
* Update pydocs, skip failing tests
* Fix pylint and mypy
* Minor adjustments
* Add release note
* Minor touch ups
* Resolve Document unique id issue by using custom id calculation
* Better hashing, add unit tests
* Small fixes