Christine Straub f673ea4c40
enhancement: add visualization script to annotate elements (#1613)
This PR was initially created to close GitHub Issue #1604 (Synchronizing the default
layout model), but since it was already resolved in PR
[#1607](https://github.com/Unstructured-IO/unstructured/pull/1607), this
PR now only adds the visualization script used to investigate the issue.

### Summary
- add python script to annotate elements

PDF:
[references.pdf](https://github.com/Unstructured-IO/unstructured/files/12778270/references.pdf)

### Evaluation
```
PYTHONPATH=. python examples/layout-analysis/visualization.py references.pdf hi_res
```
2023-10-05 12:53:16 -07:00

523 B

Analyzing Layout Elements

This directory contains examples of how to analyze layout elements.

How to run

Run pip install -r requirements.txt to install the Python dependencies.

Visualization

  • Python script (visualization.py)
$ PYTHONPATH=. python examples/layout-analysis/visualization.py <file_path> <strategy>

The strategy can be one of "auto", "hi_res", "ocr_only", or "fast". For example,

$ PYTHONPATH=. python examples/layout-analysis/visualization.py example-docs/loremipsum.pdf hi_res