Christine Straub 096d23bc28
Refactor: support layout analysis (#2273)
### Summary
This PR is the second part of the "layout analysis" refactor to move it
from unstructured-inference repo to unstructured repo, the first part is
done in
https://github.com/Unstructured-IO/unstructured-inference/pull/305. This
PR adds logic to support annotating `inferred` and `extracted` elements.

### Testing

```
PYTHONPATH=. python examples/layout-analysis/visualization.py <file_path> <strategy> <document_type>
```
e.g.
```
PYTHONPATH=. python examples/layout-analysis/visualization.py example-docs/layout-parser-paper-fast.pdf hi_res pdf
```
2023-12-19 06:21:56 +00:00
..

Analyzing Layout Elements

This directory contains examples of how to analyze layout elements.

How to run

Run pip install -r requirements.txt to install the Python dependencies.

Visualization

  • Python script (visualization.py)
$ PYTHONPATH=. python examples/layout-analysis/visualization.py <file_path> <strategy>

The strategy can be one of "auto", "hi_res", "ocr_only", or "fast". For example,

$ PYTHONPATH=. python examples/layout-analysis/visualization.py example-docs/loremipsum.pdf hi_res