build(deps): bump unstructured.paddleocr 2.8.0.1 (#3388)

### Summary
- Bump unstructured.paddleocr to `2.8.0.1` which removed `lmdb`
dependency due to license issue.

---------

Co-authored-by: Matt Robinson <mrobinson@unstructured.io>
This commit is contained in:
Christine Straub 2024-07-13 20:43:44 -07:00 committed by GitHub
parent 69cddf5f89
commit 3e1a30d338
No known key found for this signature in database
GPG Key ID: B5690EEEBB952194
4 changed files with 5 additions and 7 deletions

View File

@ -1,8 +1,8 @@
## 0.15.0-dev8 ## 0.15.0-dev9
### Enhancements ### Enhancements
* **Bump unstructured.paddleocr to 2.8.0.** * **Bump unstructured.paddleocr to 2.8.0.1.**
* **Refine HTML parser to accommodate block element nested in phrasing.** HTML parser no longer raises on a block element (e.g. `<p>`, `<div>`) nested inside a phrasing element (e.g. `<strong>` or `<cite>`). Instead it breaks the phrasing run (and therefore element) at the block-item start and begins a new phrasing run after the block-item. This is consistent with how the browser determines element boundaries in this situation. * **Refine HTML parser to accommodate block element nested in phrasing.** HTML parser no longer raises on a block element (e.g. `<p>`, `<div>`) nested inside a phrasing element (e.g. `<strong>` or `<cite>`). Instead it breaks the phrasing run (and therefore element) at the block-item start and begins a new phrasing run after the block-item. This is consistent with how the browser determines element boundaries in this situation.
* **Install rewritten HTML parser to fix 12 existing bugs and provide headroom for refinement and growth.** A rewritten HTML parser resolves a collection of outstanding bugs with HTML partitioning and provides a firm foundation for further elaborating that important partitioner. * **Install rewritten HTML parser to fix 12 existing bugs and provide headroom for refinement and growth.** A rewritten HTML parser resolves a collection of outstanding bugs with HTML partitioning and provides a firm foundation for further elaborating that important partitioner.
* **CI check for dependency licenses** Adds a CI check to ensure dependencies are appropriately licensed. * **CI check for dependency licenses** Adds a CI check to ensure dependencies are appropriately licensed.

View File

@ -1,4 +1,4 @@
-c ./deps/constraints.txt -c ./deps/constraints.txt
-c base.txt -c base.txt
unstructured.paddleocr==2.8.0 unstructured.paddleocr==2.8.0.1

View File

@ -49,8 +49,6 @@ lanms-neo==1.0.2
# via unstructured-paddleocr # via unstructured-paddleocr
lazy-loader==0.4 lazy-loader==0.4
# via scikit-image # via scikit-image
lmdb==1.5.1
# via unstructured-paddleocr
lxml==5.2.2 lxml==5.2.2
# via # via
# -c ./base.txt # -c ./base.txt
@ -154,7 +152,7 @@ tqdm==4.66.4
# via # via
# -c ./base.txt # -c ./base.txt
# unstructured-paddleocr # unstructured-paddleocr
unstructured-paddleocr==2.8.0 unstructured-paddleocr==2.8.0.1
# via -r ./extra-paddleocr.in # via -r ./extra-paddleocr.in
urllib3==1.26.19 urllib3==1.26.19
# via # via

View File

@ -1 +1 @@
__version__ = "0.15.0-dev8" # pragma: no cover __version__ = "0.15.0-dev9" # pragma: no cover