mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-03 23:20:35 +00:00

### Summary This PR is the second part of the "layout analysis" refactor to move it from unstructured-inference repo to unstructured repo, the first part is done in https://github.com/Unstructured-IO/unstructured-inference/pull/305. This PR adds logic to support annotating `inferred` and `extracted` elements. ### Testing ``` PYTHONPATH=. python examples/layout-analysis/visualization.py <file_path> <strategy> <document_type> ``` e.g. ``` PYTHONPATH=. python examples/layout-analysis/visualization.py example-docs/layout-parser-paper-fast.pdf hi_res pdf ```
Analyzing Layout Elements
This directory contains examples of how to analyze layout elements.
How to run
Run pip install -r requirements.txt
to install the Python dependencies.
Visualization
- Python script (visualization.py)
$ PYTHONPATH=. python examples/layout-analysis/visualization.py <file_path> <strategy>
The strategy can be one of "auto", "hi_res", "ocr_only", or "fast". For example,
$ PYTHONPATH=. python examples/layout-analysis/visualization.py example-docs/loremipsum.pdf hi_res