mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-09-07 15:50:07 +00:00

This PR aims to skip element sorting when determining whether embedded text can be extracted. The extracted elements in this step are returned as final elements only for the `fast` strategy pipeline and are never used for other strategy pipelines (`hi_res`, `ocr`). Removing element sorting in this step and adding it to the `fast` strategy pipeline later will improve performance and reduce execution time. ### Summary - skip element sorting when determining whether embedded text can be extracted. - add `_partition_pdf_with_pdfparser()` function for fast` strategy pipeline ### Testing CI should pass.