unstructured/example-docs/layout-parser-paper-with-table.pdf
Christine Straub 210d53a7e0
Fix: missing columns on table ingest output after table OCR refactor (#1959)
Closes #1873.
### Summary
Table OCR refactoring changed the default padding value for table image
cropping from
[12](https://github.com/Unstructured-IO/unstructured-inference/blob/main/unstructured_inference/inference/layoutelement.py#L95)
to
[0](https://github.com/Unstructured-IO/unstructured/blob/main/unstructured/partition/ocr.py#L260),
causing some columns in the table to be missing.
### Testing
```
filename = "example-docs/layout-parser-paper-with-table.pdf"
elements = pdf.partition_pdf(
    filename=filename,
    strategy="hi_res",
    infer_table_structure=True,
)
table = [el.metadata.text_as_html for el in elements if el.metadata.text_as_html]
assert "Large Model" in table[0]
```

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: christinestraub <christinestraub@users.noreply.github.com>
2023-11-01 18:34:27 +00:00

168 KiB