fix: pytesseract>=0.3.12 installation error while installing pdf extra (#3522)

Closes #3521.

This PR resolves an installation error with `pytesseract>=0.3.12` that
occurred during `pip install unstructured[pdf]==0.15.3`.

### Testing
**Run following command in main branch and this PR**
```
pip uninstall -y pytesseract && pip install ".[pdf]"
```
**Results**
- `main` branch
```
INFO: pip is looking at multiple versions of unstructured[pdf] to determine which version is compatible with other requirements. This could take a while.
ERROR: Could not find a version that satisfies the requirement pytesseract>=0.3.12; extra == "pdf" (from unstructured[pdf]) (from versions: 0.1, 0.1.3, 0.1.4, 0.1.5, 0.1.6, 0.1.7, 0.1.8, 0.1.9, 0.2.0, 0.2.2, 0.2.4, 0.2.5, 0.2.6, 0.2.7, 0.2.8, 0.2.9, 0.3.0, 0.3.1, 0.3.2, 0.3.3, 0.3.4, 0.3.5, 0.3.6, 0.3.7, 0.3.8, 0.3.9, 0.3.10)
ERROR: No matching distribution found for pytesseract>=0.3.12; extra == "pdf"
```
- this `PR`

`pytesseract-0.3.13` should be installed successfully.
This commit is contained in:
Christine Straub 2024-08-14 14:15:40 -07:00 committed by GitHub
parent d6a84bdfbb
commit 9b778e270d
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
5 changed files with 16 additions and 7 deletions

View File

@ -1,3 +1,13 @@
## 0.15.4
### Enhancements
### Features
### Fixes
* **Resolve an installation error with `pytesseract>=0.3.12` that occurred during `pip install unstructured[pdf]==0.15.3`.**
## 0.15.3
### Enhancements

View File

@ -22,8 +22,7 @@ Office365-REST-Python-Client<2.4.3
# unstructured-inference to be upgraded when unstructured library is upgraded
# https://github.com/Unstructured-IO/unstructured/issues/1458
# unstructured-inference
# use the known compatible version of weaviate and pytesseract
pytesseract @ git+https://github.com/madmaze/pytesseract.git@v0.3.13
# use the known compatible version of weaviate
weaviate-client>3.25.0
# TODO: Pinned in transformers package, remove when that gets updated
tokenizers>=0.19,<0.20

View File

@ -12,4 +12,6 @@ effdet
# Do not move to constraints.in, otherwise unstructured-inference will not be upgraded
# when unstructured library is.
unstructured-inference==0.7.36
pytesseract>=0.3.12
# NOTE(christine): Pinned to a specific version of pytesseract from the GitHub repository.
# Remove this pin and switch to the latest version from PyPI once version 0.3.13 or newer is officially released.
pytesseract @ git+https://github.com/madmaze/pytesseract.git@v0.3.13

View File

@ -202,9 +202,7 @@ pypdf==4.3.1
pypdfium2==4.30.0
# via pdfplumber
pytesseract @ git+https://github.com/madmaze/pytesseract.git@v0.3.13
# via
# -c ././deps/constraints.txt
# -r ./extra-pdf-image.in
# via -r ./extra-pdf-image.in
python-dateutil==2.9.0.post0
# via
# -c ./base.txt

View File

@ -1 +1 @@
__version__ = "0.15.3" # pragma: no cover
__version__ = "0.15.4" # pragma: no cover