mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-11-13 00:47:32 +00:00
The purpose of this PR is to use the newly created `is_extracted` parameter in `TextRegion` (and the corresponding vector version `is_extracted_array` in `TextRegions`), flagging elements that were extracted directly from PDFs as such. This also involved: - New tests - A version update to bring in the new `unstructured-inference` - An ingest fixtures update - An optimization from Codeflash that's not directly related One important thing to review is that all avenues by which an element is extracted and ends up in the output of a partition are covered... fast, hi_res, etc. --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com> Co-authored-by: luke-kucing <luke@unstructured.io> Co-authored-by: Claude <noreply@anthropic.com> Co-authored-by: qued <qued@users.noreply.github.com>