mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-06 16:42:42 +00:00

This PR fixes the issue with `docx` with complex/recursive/merged/malformed tables by skipping cells that could not trace back to a valid `<w:tc>` element used by the `python-docx` due to missing or improperly merged rows. Accessing row.cells in such cases can raise a `ValueError` when `python-docx` fails to resolve the full logical table layout. This PR wraps those calls in `try/except` to skip problematic rows while continuing to extract usable content from the rest of the document.
810 KiB
810 KiB