5 Commits

Author SHA1 Message Date
Yuming Long
55ad5fd637
fix chucking text None type has no attribute stripe (#4018)
### Summary
To fix error `Error in chunk: 512: {"detail":"'NoneType' object has no
attribute 'strip'"}` I found the logs under same org (could assume this
is the same job)

screenshot:
![Screenshot 2025-06-11 at 10 15
57 AM](https://github.com/user-attachments/assets/c50ada55-eef1-43f7-9e27-9b9ae339a6fb)

stack trace from the `utic-api` ES log doc:
![Screenshot 2025-06-11 at 2 01
01 PM](https://github.com/user-attachments/assets/7e84fa24-4eb6-45e8-b195-a11d3d124bfa)



### Notes
longer term we should make partitioner (vlm + utic-api) not return text
with Null

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: yuming-long <yuming-long@users.noreply.github.com>
2025-06-12 18:28:46 +00:00
Yao You
37d2f021a3
Feat/bump inference (#4013)
Bump `unstructured-inference` to `1.0.5`, which includes fix to ensure
model init is thread safe.

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: badGarnet <badGarnet@users.noreply.github.com>
2025-06-06 09:52:17 +00:00
luke-kucing
a7e90f7990
resolve CVEs and HF issue (#4009)
update reqs to resolve CVEs and add the HF ENV to stop it from reaching
out

updated the Dockerfile with
ENV HF_HUB_OFFLINE=1

to stop it from pinging HF. This was an issue for a gov customer. and
updated requirements to resolve some open CVEs

---------

Co-authored-by: cragwolfe <crag@unstructured.io>
Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: luke-kucing <luke-kucing@users.noreply.github.com>
2025-06-04 18:52:58 +00:00
cragwolfe
dfa17bd3a0
fix: hi_res PDF parsing: only uncategorized text for extracted elements (#3975) 2025-04-04 14:38:23 -07:00
Marek Połom
f333d7fe7f
feat: Json elements to HTML converter (#3936)
## NOTE
`test_unstructured_ingest/expected-structured-output-html` contains all
test HTML fixtures. Original JSON files, from which these HTML fixtures
are generated, were taken from
`test_unstructured_ingest/expected-structured-output`
2025-03-04 13:57:35 +00:00