Yao You 97fb10db4a
fix: default hi_res model rely on inference setting (#2441)
- there are multiple places setting the default `hi_res_model_name` in
both `unstructured` and `unstructured-inference`
- they lead to inconsistency and unexpected behaviors
- this fix removes a helper in `unstructured` that tries to set the
default hi_res layout detection model; instead we rely on the
`unstructured-inference` to provide that default when no explicit model
name is passed in

## test

```bash
UNSTRUCTURED_INCLUDE_DEBUG_METADATA=true ipython
```

```python
from unstructured.partition.auto import partition

# find a pdf file
elements = partition("foo.pdf", strategy="hi_res")
assert elements[0].metadata.detection_origin == "yolox"
```

---------

Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com>
Co-authored-by: badGarnet <badGarnet@users.noreply.github.com>
2024-01-29 16:44:41 +00:00

979 B

1filenamedoctypeconnectorcct-accuracycct-%missing
2fake-text.txttxtSharepoint1.00.0
3ideas-page.htmlhtmlSharepoint0.930.033
4stanley-cups.xlsxxlsxSharepoint0.7780.0
5Core-Skills-for-Biomedical-Data-Scientists-2-pages.pdfpdfazure0.9810.005
6IRS-form-1987.pdfpdfazure0.7940.135
7spring-weather.htmlhtmlazure0.00.018
8example-10k.htmlhtmllocal0.7270.037
9fake-html-cp1252.htmlhtmllocal0.6590.0
10ideas-page.htmlhtmllocal0.930.033
11UDHR_first_article_all.txttxtlocal-single-file0.9950.0
12handbook-1p.docxdocxlocal-single-file-basic-chunking0.8580.029
13fake-html-cp1252.htmlhtmllocal-single-file-with-encoding0.6590.0
14layout-parser-paper-with-table.jpgjpglocal-single-file-with-pdf-infer-table-structure0.7160.032
15layout-parser-paper.pdfpdflocal-single-file-with-pdf-infer-table-structure0.9490.029
162023-Jan-economic-outlook.pdfpdfs30.8340.054
17page-with-formula.pdfpdfs30.9710.021
18recalibrating-risk-report.pdfpdfs30.9660.009