2 Commits

Author SHA1 Message Date
cragwolfe
d0749d181f
fix: avoid PDF sorting error on negative coords (#1361)
The default sorting algorithm for PDF's, "xycut," would cause an error
when partitioning a document if Y coordinate points were negative. This
change checks for that condition (or more broadly, any negative
coordinates) and falls back to the "basic" sort if that is the case.

This PR does not address the underlying issue of "bad points" which
still should be investigated. However, the sorting code should be less
brittle to unexpected bounding boxes in the first case.

Resolves: https://github.com/Unstructured-IO/unstructured/issues/1296
2023-09-10 19:29:49 -07:00
Yao You
9191be7ae8
[issue 1237] fix empty coordinates break sorting bug (#1242)
This PR resolves #1237 by checking if any coordinates are `None`; if yes
do not attempt to sort with xy cut method and return the list as is.
2023-09-01 03:15:10 +00:00