haystack/releasenotes/notes/add-logs-empty-files-pdf-f28a14e52984c1ea.yaml
Sebastian Husch Lee 911f3523ab
feat: Increase logging transparency for empty Documents during conversion (#8509)
* Add log lines for PDF conversion and make skipping more explicit in DocumentSplitter

* Add logging statement for PDFMinerToDocument as well

* Add tests

* Remove unused line

* Remove unused line

* add reno

* Add in PDF file

* Update checks in PDF converters and add tests for document splitter

* Revert

* Remove line

* Fix comment

* Make mypy happy

* Make mypy happy
2024-11-04 09:26:57 +01:00

8 lines
426 B
YAML

---
features:
- |
Add warning logs to the PDFMinerToDocument and PyPDFToDocument to indicate when a processed PDF file has no content.
This can happen if the PDF file is a scanned image.
Also added an explicit check and warning message to the DocumentSplitter that warns the user that empty Documents are skipped.
This behavior was already occurring, but now its clearer through logs that this is happening.