Yao You 2dceac34b5
Feat/remove reference of PageLayout.elements (#3943)
This PR removes usage of `PageLayout.elements` from partition function,
except for when `analysis=True`. This PR updates the partition logic so
that `PageLayout.elements_array` is used everywhere to save memory and
cpu cost.
Since the analysis function is intended for investigation and not for
general document processing purposes, this part of the code is left for
a future refactor.

`PageLayout.elements` uses a list to store layout elements' data while
`elements_array` uses `numpy` array to store the data, which has much
lower memory requirements. Using `memory_profiler` to test the
differences is usually around 10x.
2025-03-12 15:21:21 +00:00
..
2024-02-06 17:11:55 +00:00