mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-08 09:33:43 +00:00

This PR was initially created to close GitHub Issue #1604 (Synchronizing the default layout model), but since it was already resolved in PR [#1607](https://github.com/Unstructured-IO/unstructured/pull/1607), this PR now only adds the visualization script used to investigate the issue. ### Summary - add python script to annotate elements PDF: [references.pdf](https://github.com/Unstructured-IO/unstructured/files/12778270/references.pdf) ### Evaluation ``` PYTHONPATH=. python examples/layout-analysis/visualization.py references.pdf hi_res ```
523 B
523 B
Analyzing Layout Elements
This directory contains examples of how to analyze layout elements.
How to run
Run pip install -r requirements.txt
to install the Python dependencies.
Visualization
- Python script (visualization.py)
$ PYTHONPATH=. python examples/layout-analysis/visualization.py <file_path> <strategy>
The strategy can be one of "auto", "hi_res", "ocr_only", or "fast". For example,
$ PYTHONPATH=. python examples/layout-analysis/visualization.py example-docs/loremipsum.pdf hi_res