Logo
Explore Help
Register Sign In
yujunjun/unstructured
1
0
Fork 0
You've already forked unstructured
mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-09-08 08:08:29 +00:00
Code Issues Packages Projects Releases Wiki Activity
unstructured/test_unstructured_ingest/metrics/all-docs-cct.tsv

4 lines
125 B
Plaintext
Raw Normal View History

build: text extraction evaluation metrics workflow added (#1757) **Executive Summary** This PR adds the evaluation metrics to our current workflow. It verifies the flow that when the code is pushed, the code will gets evaluate against our gold standard and output into `.tsv` file. **Technical Details** - Adds evaluation metrics to the test-ingest workflow - Make use of `structured-output` from `test-ingest` and compare to the gold-standard uploaded in s3, and download into local when make comparison. The current folder in-use is `s3://utic-dev-tech-fixtures/small-cct`. This dir is editable in the shell script. - With this PR, only one file from one connector is use to compare. **Misc** - Not many overlapped files between test-ingest and gold-standard. More files will be added. **Outputs** 2 `.tsv` files are saved under `test_unstructured_ingest/metrics/`. ![image](https://github.com/Unstructured-IO/unstructured/assets/2177850/222e437c-1a94-4d7c-9320-81696633b1ae) ![image](https://github.com/Unstructured-IO/unstructured/assets/2177850/5c840322-6739-4634-8868-eba04b4ebc96) --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: Klaijan <Klaijan@users.noreply.github.com>
2023-10-23 17:39:22 -04:00
filename connector cct-accuracy cct-%missing
example-10k.html local 0.686 0.04
science-exploration-1p.pptx box 0.861 0.09
Reference in New Issue Copy Permalink
Powered by Gitea Version: 1.23.5 Page: 435ms Template: 21ms
English
Bahasa Indonesia Deutsch English Español Français Gaeilge Italiano Latviešu Magyar nyelv Nederlands Polski Português de Portugal Português do Brasil Suomi Svenska Türkçe Čeština Ελληνικά Български Русский Українська فارسی മലയാളം 日本語 简体中文 繁體中文(台灣) 繁體中文(香港) 한국어
Licenses API