mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-07-22 16:37:33 +00:00

- there are multiple places setting the default `hi_res_model_name` in both `unstructured` and `unstructured-inference` - they lead to inconsistency and unexpected behaviors - this fix removes a helper in `unstructured` that tries to set the default hi_res layout detection model; instead we rely on the `unstructured-inference` to provide that default when no explicit model name is passed in ## test ```bash UNSTRUCTURED_INCLUDE_DEBUG_METADATA=true ipython ``` ```python from unstructured.partition.auto import partition # find a pdf file elements = partition("foo.pdf", strategy="hi_res") assert elements[0].metadata.detection_origin == "yolox" ``` --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: badGarnet <badGarnet@users.noreply.github.com>
19 lines
979 B
Plaintext
19 lines
979 B
Plaintext
filename doctype connector cct-accuracy cct-%missing
|
|
fake-text.txt txt Sharepoint 1.0 0.0
|
|
ideas-page.html html Sharepoint 0.93 0.033
|
|
stanley-cups.xlsx xlsx Sharepoint 0.778 0.0
|
|
Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdf pdf azure 0.981 0.005
|
|
IRS-form-1987.pdf pdf azure 0.794 0.135
|
|
spring-weather.html html azure 0.0 0.018
|
|
example-10k.html html local 0.727 0.037
|
|
fake-html-cp1252.html html local 0.659 0.0
|
|
ideas-page.html html local 0.93 0.033
|
|
UDHR_first_article_all.txt txt local-single-file 0.995 0.0
|
|
handbook-1p.docx docx local-single-file-basic-chunking 0.858 0.029
|
|
fake-html-cp1252.html html local-single-file-with-encoding 0.659 0.0
|
|
layout-parser-paper-with-table.jpg jpg local-single-file-with-pdf-infer-table-structure 0.716 0.032
|
|
layout-parser-paper.pdf pdf local-single-file-with-pdf-infer-table-structure 0.949 0.029
|
|
2023-Jan-economic-outlook.pdf pdf s3 0.834 0.054
|
|
page-with-formula.pdf pdf s3 0.971 0.021
|
|
recalibrating-risk-report.pdf pdf s3 0.966 0.009
|