mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-09-25 08:19:22 +00:00

This PR vectorizes the computation of element overlap to speed up deduplication process of extracted elements. ## test This PR adds unit test to the new vectorized IOU and subregion computation functions. In addition, running partition on large files with many elements like this slide: [002489.pdf](https://github.com/user-attachments/files/16823176/002489.pdf) shows a reduction of runtime from around 15min on the main branch to less than 4min with this branch. Profiling results show that the new implementation greatly reduces the time cost of computation and now most of the time is spend on getting the coordinates from a list of bboxes. 