unstructured

mirror of https://github.com/Unstructured-IO/unstructured.git synced 2026-01-07 12:50:54 +00:00

History

fix: add language to OCRAgentGoogleVision constructor (#3696 )

This PR addresses issue #3659 by adding an optional `language` parameter
to the `OCRAgentGoogleVision` class constructor.

This parameter serves as a "language hint" for the
`document_text_detection` method in the `ImageAnnotatorClient`. For more
information on language hints, refer to the [Google Cloud Vision
documentation](https://cloud.google.com/vision/docs/languages).


**Default Behavior**: 
The language parameter defaults to None, allowing Google Cloud Vision to
auto-detect the language, as recommended in their documentation.

**Purpose**: 
This change is necessary because the `OCRAgent`'s `get_instance` method
expects all `OCRAgent`s to include a language parameter in their
constructors.

**Context on Issue:**
When trying to parse a PDF with
`OCR_AGENT=unstructured.partition.utils.ocr_models.google_vision_ocr.OCRAgentGoogleVision`,
an error occurs in the `get_instance` method. The method expects a
`language` parameter, which the current `OCRAgentGoogleVision`
constructor does not support, leading to a positional argument error.

---------

Co-authored-by: Christine Straub <christinemstraub@gmail.com>

2024-10-14 05:35:05 +00:00

chunking

rfctr(part): prepare for pluggable auto-partitioners 2 (#3657 )

2024-09-24 17:33:25 +00:00

cleaners

rfctr: prepare to add orig_elements serde (#2668 )

2024-03-20 21:27:59 +00:00

common

feat(chunk): split tables on even row boundaries (#3504 )

2024-08-19 18:56:53 +00:00

documents

rfctr(part): remove double-decoration 4 (#3690 )