mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-11-03 11:34:07 +00:00
Closes #2212. ### Summary This PR implements logic to fall back to the "inferred_layout + OCR" if pdfminer fails in the `hi_res` pipeline (discussed in[ this slack channel](https://unstructuredw-kbe4326.slack.com/archives/C057R3F8F7A/p1701807299018929). ### Testing PDF: [NASA-SNA-8-D-027III-Rev2-CsmLmSpacecraftOperationalDataBook-Volume3-MassProperties-pg856.pdf](https://github.com/Unstructured-IO/unstructured/files/13554149/NASA-SNA-8-D-027III-Rev2-CsmLmSpacecraftOperationalDataBook-Volume3-MassProperties-pg856.pdf) ``` elements = partition_pdf( filename="NASA-SNA-8-D-027III-Rev2-CsmLmSpacecraftOperationalDataBook-Volume3-MassProperties-pg856.pdf", strategy="hi_res", ) ``` --------- Co-authored-by: christinestraub <christinemstraub@gmail.com>