4 Commits

Author SHA1 Message Date
Sebastian Husch Lee
6836079686
chore: Capitalize DOCX in DOCXToDocument converter (#7931)
* Capitalize DOCX in DOCXToDocument converter

* Update docstrings

* Update test class name

* add releease notes
2024-06-27 08:19:01 +02:00
Sebastian Husch Lee
3db56d9066
refactor: DocxToDocument update (#7857)
* Some changes

Use tests file path

* Update tests

* Add another unit test

* Shorten _get_docx_metadata

* Update tests

* Remove try block

* Add a dataclass

* Add a to dict unit test

* Remove unused import

* Add release notes

* Update docstrings

* Use optional instead of pipe

* Update docstring

* Remove file
2024-06-19 15:48:31 +02:00
Stefano Fiorucci
8de639bd70
DocxDocument forward reference (#7852) 2024-06-13 11:29:31 +02:00
Carlos Fernández
c1c339923f
feat: add DocxToDocument converter (#7838)
* first fucntioning DocxFileToDocument

* fix lazy import message

* add reno

* Add license headder

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* change DocxFileToDocument to DocxToDocument

* Update library install to the maintained version

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* clan try-exvept to only take non haystack errors into account

* Add wanring on docstring of component ignoring page brakes, mark test as skip

* make warnings lazy evaluations

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* make warnings lazy evaluations

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Make warnings lazy evaluated

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Solve f bug

* Get more metadata from docx files

* add 'python-docx' dependency and docs

* Change logging import

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Fix typo

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* remake metadata extraction for docx

* solve bug regarding _get_docx_metadata method

* Update haystack/components/converters/docx.py

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Update haystack/components/converters/docx.py

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>

* Delete unused test

---------

Co-authored-by: Sebastian Husch Lee <sjrl@users.noreply.github.com>
2024-06-12 11:58:36 +02:00