4 Commits

Author SHA1 Message Date
Austin Walker
dd243b4fd9
chore: pass ocr_mode in partition_pdf_or_image (#1154)
Set to individual_blocks for now to work around [this
bug](https://github.com/Unstructured-IO/unstructured-inference/issues/179).

I verified by printing the current ocr_mode in inference. The
`entire_page` default is overridden.

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: awalker4 <awalker4@users.noreply.github.com>
2023-08-18 20:59:08 +00:00
cragwolfe
dd0f582585
build(deps): bump unstructured-inference==0.5.13 (#1141)
Bump to unstructured-inference==0.5.13, which includes:

Fix extracted image elements being included in layout merge, addresses the issue
where an entire-page image in a PDF was not passed to the layout model when using hi_res.
2023-08-17 06:25:00 +00:00
Christine Straub
0a23139720
enhancement: implement full-page OCR(#1133)
*implements full-page OCR as supported in unstructured-inference=0.5.11.
2023-08-16 19:16:35 +00:00
Yuming Long
b4fe40e484
Chore[ingest]: adding parameter --partition-pdf-infer-table-structure (#1056)
* add param

* expected test

* add option (to do doc nit)

* test with api for now

* typo

* test with api key

* use local only

* encoding -> partition-encoding

* changelog and version

* Update ingest test fixtures (#1055)

Co-authored-by: yuming-long <yuming-long@users.noreply.github.com>

* ignore coordinates

* no witespace lol

* Update ingest test fixtures (#1061)

Co-authored-by: yuming-long <yuming-long@users.noreply.github.com>

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: yuming-long <yuming-long@users.noreply.github.com>
2023-08-08 18:11:06 -04:00