Logo
Explore Help
Register Sign In
yujunjun/unstructured
1
0
Fork 0
You've already forked unstructured
mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-07-25 18:05:19 +00:00
Code Issues Packages Projects Releases Wiki Activity
unstructured/test_unstructured_ingest/metrics/all-docs-cct.tsv

6 lines
240 B
Plaintext
Raw Normal View History

chore: add doctype to ingest evaluation functions (#1977) ### Summary To combine ingest and holistic metrics efforts, add the `doctype` field to the results from the functions in evaluate.py for use in subsequent aggregation functions. ### Test Run `sh ./test_unstructured_ingest/evaluation-metrics.sh text-extraction` and there will be a new doctype column with the file's doctype extension. <img width="508" alt="Screenshot 2023-11-01 at 2 23 11 PM" src="https://github.com/Unstructured-IO/unstructured/assets/42684285/44583da9-e7ef-4142-be72-c2247b954bcf"> --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: shreyanid <shreyanid@users.noreply.github.com>
2023-11-02 12:15:53 -07:00
filename doctype connector cct-accuracy cct-%missing
science-exploration-1p.pptx pptx dropbox 0.861 0.093
science-exploration-1p.pptx pptx box 0.861 0.093
example-10k.html html local 0.686 0.037
IRS-form-1987.pdf pdf azure 0.783 0.135
Reference in New Issue Copy Permalink
Powered by Gitea Version: 1.23.5 Page: 565ms Template: 29ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API