unstructured

yujunjun/unstructured

Fork 0

mirror of https://github.com/Unstructured-IO/unstructured.git synced 2025-11-09 15:07:30 +00:00

Commit Graph

Author	SHA1	Message	Date
Yao You	3b9b01c502	Feat: weighted average table metrics (#3348 ) This PR uses (number of actual table) weighted average instead of average without weights for table metrics. - pages where there are ground truth tables the weight is proportional to the number of ground truth tables in that page - pages where there are no ground truth tables but has predicted tables (false positive) are assigned as 1 table worth of weight for the whole page for calculating the mean value of `table_level_acc` - pages with false positive tables do not contribute to table structural or table content metrics ## test This PR updates the existing test for evaluating table metrics: - adds a second file with just 1 table vs. the existing file with 2 tables - test the weighted average is written to the report	2024-11-20 17:14:57 +00:00
Pawel Kmiecik	ff9d46f9dc	feat(eval): table evaluation metrics (#2558 ) This PR adds new table evaluation metrics prepared by @leah1985 The metrics include: - `table count` (check) - `table_level_acc` - accuracy of table detection - `element_col_level_index_acc` - accuracy of cell detection in columns - `element_row_level_index_acc` - accuracy of cell detection in rows - `element_col_level_content_acc` - accuracy of content detected in columns - `element_row_level_content_acc` - accuracy of content detected in rows TODO in next steps: - create a minimal dataset and upload to s3 for ingest tests - generate and add metrics on the above dataset to `test_unstructured_ingest/metrics`	2024-02-22 16:35:46 +00:00

Author

SHA1

Message

Date

Yao You

3b9b01c502

Feat: weighted average table metrics (#3348 )

This PR uses (number of actual table) weighted average instead of
average without weights for table metrics.

- pages where there are ground truth tables the weight is proportional
to the number of ground truth tables in that page
- pages where there are no ground truth tables but has predicted tables
(false positive) are assigned as 1 table worth of weight for the whole
page for calculating the mean value of `table_level_acc`
- pages with false positive tables do not contribute to table structural
or table content metrics

## test

This PR updates the existing test for evaluating table metrics:
- adds a second file with just 1 table vs. the existing file with 2
tables
- test the weighted average is written to the report

2024-11-20 17:14:57 +00:00

Pawel Kmiecik

ff9d46f9dc

feat(eval): table evaluation metrics (#2558 )

This PR adds new table evaluation metrics prepared by @leah1985 
The metrics include:
- `table count` (check)
- `table_level_acc` - accuracy of table detection
- `element_col_level_index_acc` - accuracy of cell detection in columns
- `element_row_level_index_acc` - accuracy of cell detection in rows
- `element_col_level_content_acc` - accuracy of content detected in
columns
- `element_row_level_content_acc` - accuracy of content detected in rows

TODO in next steps:
- create a minimal dataset and upload to s3 for ingest tests
- generate and add metrics on the above dataset to
`test_unstructured_ingest/metrics`

2024-02-22 16:35:46 +00:00

2 Commits