unstructured/test_unstructured_ingest/expected-structured-output/local-single-file-with-pdf-infer-table-structure
Christine Straub 237d04c896
feat: improve natural reading order by filtering OCR results (#1768)
### Summary
Some `OCR` elements with only spaces in the text have full-page width in
the bounding box, which causes the `xycut` sorting to not work as
expected. Now the logic to parse OCR results removes any elements with
only spaces (more than one space).

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
2023-10-16 23:05:55 +00:00
..