Daniel Bichuetti 28724e2e25
feat: add automatic OCR detection mechanism and improve performance (#4329)
* feat: add automatic OCR detection mechanism and improve performance

* refactor: add error message

* refactor: ignore pdftoppm bad typing

* refactor: add Tesseract install. docstrings

* fix: check if OCR var. assigned on mp

* tests: add path to windows/linux tests

* tests: add tessdata path

* tests: include matrix ref.

* tests: custom Tesseract matrix install

* refactor: improve user guide

* tests: fix macos path

* tests: remove brew formulae version

* fix: macos paths

* tests: fix macos path

* tests: add Tesseract to Windows Path

* tests: pytesseract path

* tests: macos path

* refactor: fix path message and remove extra path from tests

* refactor: raise exception when path not found

* refactor: expression simplification

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* refactor: check ocr parameter

* tests: mark as integration

* tests: mock deprecation warning

* refactor: simplify code

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* refactor: change deprecation test

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>

* refactor: add unit patch

* refactor: black formatting

---------

Co-authored-by: Silvano Cerza <3314350+silvanocerza@users.noreply.github.com>
Co-authored-by: Mayank Jobanputra <mayankjobanputra@gmail.com>
2023-03-13 20:19:22 +05:30
..