mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-09 01:55:55 +00:00

This PR was initially created to close GitHub Issue #1604 (Synchronizing the default layout model), but since it was already resolved in PR [#1607](https://github.com/Unstructured-IO/unstructured/pull/1607), this PR now only adds the visualization script used to investigate the issue. ### Summary - add python script to annotate elements PDF: [references.pdf](https://github.com/Unstructured-IO/unstructured/files/12778270/references.pdf) ### Evaluation ``` PYTHONPATH=. python examples/layout-analysis/visualization.py references.pdf hi_res ```
17 lines
523 B
Markdown
17 lines
523 B
Markdown
# Analyzing Layout Elements
|
|
|
|
This directory contains examples of how to analyze layout elements.
|
|
|
|
## How to run
|
|
|
|
Run `pip install -r requirements.txt` to install the Python dependencies.
|
|
|
|
### Visualization
|
|
- Python script (visualization.py)
|
|
```
|
|
$ PYTHONPATH=. python examples/layout-analysis/visualization.py <file_path> <strategy>
|
|
```
|
|
The strategy can be one of "auto", "hi_res", "ocr_only", or "fast". For example,
|
|
```
|
|
$ PYTHONPATH=. python examples/layout-analysis/visualization.py example-docs/loremipsum.pdf hi_res
|
|
``` |