unstructured/examples/custom-layout-order
Christine Straub b30d6a601e
Fix/1209 tweak xycut ordering output (#1630)
Closes GH Issue #1209.

### Summary
- add swapped `xycut` sorting
- update `xycut` sorting evaluation script

PDFs:
-
[sbaa031.073.pdf](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7234218/pdf/sbaa031.073.pdf)
-
[multi-column-2p.pdf](https://github.com/Unstructured-IO/unstructured/files/12796147/multi-column-2p.pdf)
-
[11723901.pdf](https://github.com/Unstructured-IO/unstructured-inference/files/12360085/11723901.pdf)
### Testing
```
elements = partition_pdf("sbaa031.073.pdf", strategy="hi_res")
print("\n\n".join([str(el) for el in elements]))
```
### Evaluation
```
PYTHONPATH=. python examples/custom-layout-order/evaluate_xy_cut_sorting.py sbaa031.073.pdf hi_res xycut_only
```
2023-10-05 07:41:38 +00:00
..

Custom Layout Sorting

This directory contains examples of how xy-cut sorting works.

Running the example

Running script(.py)

export PYTHONPATH=.:$PYTHONPATH && python examples/custom-layout-order/evaluate_xy_cut_sorting.py <file_path> <strategy>

Here, the file should be under the project root directory. For example,

export PYTHONPATH=.:$PYTHONPATH && python examples/custom-layout-order/evaluate_xy_cut_sorting.py example-docs/multi-column-2p.pdf fast

Running jupyter notebook

The Google Colab version of the notebook can be found here: https://colab.research.google.com/drive/1HgBvHNPnY-dXO043DftvvMeynlLPgQ_p