mirror of
https://github.com/Unstructured-IO/unstructured.git
synced 2025-09-26 08:53:15 +00:00

### Summary To combine ingest and holistic metrics efforts, add the `doctype` field to the results from the functions in evaluate.py for use in subsequent aggregation functions. ### Test Run `sh ./test_unstructured_ingest/evaluation-metrics.sh text-extraction` and there will be a new doctype column with the file's doctype extension. <img width="508" alt="Screenshot 2023-11-01 at 2 23 11 PM" src="https://github.com/Unstructured-IO/unstructured/assets/42684285/44583da9-e7ef-4142-be72-c2247b954bcf"> --------- Co-authored-by: ryannikolaidis <1208590+ryannikolaidis@users.noreply.github.com> Co-authored-by: shreyanid <shreyanid@users.noreply.github.com>
240 B
240 B
1 | filename | doctype | connector | cct-accuracy | cct-%missing |
---|---|---|---|---|---|
2 | science-exploration-1p.pptx | pptx | dropbox | 0.861 | 0.093 |
3 | science-exploration-1p.pptx | pptx | box | 0.861 | 0.093 |
4 | example-10k.html | html | local | 0.686 | 0.037 |
5 | IRS-form-1987.pdf | azure | 0.783 | 0.135 |