The purpose of this PR is to use the newly created `is_extracted`
parameter in `TextRegion` (and the corresponding vector version
`is_extracted_array` in `TextRegions`), flagging elements that were
extracted directly from PDFs as such.
This also involved:
- New tests
- A version update to bring in the new `unstructured-inference`
- An ingest fixtures update
- An optimization from Codeflash that's not directly related
One important thing to review is that all avenues by which an element is
extracted and ends up in the output of a partition are covered... fast,
hi_res, etc.
---------
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: codeflash-ai[bot] <148906541+codeflash-ai[bot]@users.noreply.github.com>
Co-authored-by: luke-kucing <luke@unstructured.io>
Co-authored-by: Claude <noreply@anthropic.com>
Co-authored-by: qued <qued@users.noreply.github.com>